CERN Accelerating science

Rucio crosses the exabyte horizon

When we last wrote in these pages about Rucio, ATLAS had just passed the half-exabyte mark and the community counted around thirty organisations. Six years on, a threshold that was then a distant projection has been crossed: both ATLAS and CMS, the two largest LHC experiments, now each manage more than one exabyte of data with Rucio. Alongside this milestone, the project has formalised the Rucio Advisory Board, CERN IT has launched Rucio-as-a-Service for small and medium-sized (SME) experiments, three EU-funded projects are extending the system into new domains, and the next Community Workshop is on its way to Lyon.

Two experiments, more than two exabytes

Rucio is an open-source framework, developed at CERN and supported by a worldwide community, that organises, manages, and provides access to scientific data distributed across geographically distributed storage. First put into production for ATLAS in December 2014, it has since become the data management system of choice for a growing list of large collaborations.

Crossing one exabyte per experiment is more than a round number. Data volumes of this scale are not served from a single site: in the WLCG model, they are deliberately distributed across the collaborating institutes, spreading storage costs and stewardship across the member states, placing data close to the computing that consumes it, and ensuring that no single site failure can endanger the data. ATLAS reached this milestone during Run-3, with its data spread over more than 120 sites worldwide. CMS, having migrated from its legacy PhEDEx system to Rucio in preparation for Run-3, has now followed and likewise crossed the exabyte mark. In both experiments, Rucio is the system that makes this distribution workable: it orchestrates daily transfers measured in tens of petabytes and shields physicists from the heterogeneity of the underlying storage: disk, tape, HPC, and commercial cloud alike. With LHC Run-4 on the horizon, this is best understood not as an endpoint but as the start of a new operational regime.

RUCIO data management

 

A Rucio Advisory Board to match a community of communities

Rucio long ago outgrew its origins as a single-experiment tool: it is now critical infrastructure for a whole portfolio of flagship science programmes: ATLAS, CMS, DUNE, Belle II, LIGO/Virgo, the Rubin Observatory, CTAO, and SKA among them, each with its own timeline, priorities and funding realities. That success brings a strategic question: how does a project serving many collaborations decide where to invest its limited development effort, and how do the communities that depend on it gain confidence in its long-term direction? The newly established Rucio Advisory Board is the answer. Bringing together senior representatives of the major communities running Rucio in production, the Board gives the experiments a direct voice in shaping the project's roadmap, helps balance priorities when needs diverge, and anchors the long-term sustainability of a system that the coming decade of data-intensive science will depend on. It is a signal that Rucio is managed not just as a piece of software, but as shared scientific infrastructure with a planning horizon to match.

Rucio-as-a-Service for Small and Medium Experiments

Not every collaboration has the means to operate a Rucio deployment of its own. To bridge that gap, CERN IT has established in 2026 a Rucio-as-a-Service offering for Small and Medium Experiments (SMEs). Built on a modern Kubernetes and ArgoCD foundation, with integrated secrets management and automated DNS, the service lets a smaller CERN community obtain a production-grade Rucio instance without having to assemble the underlying middleware stack itself. The service is currently being validated through prototype deployments with first pilot communities, AMS and SHiP, whose experience is shaping the templates, defaults and operational procedures that future tenants will inherit. Notably, these pilots include the FCC community: the data of the Future Circular Collider studies is handled through such a prototype instance. The service also provides a natural entry point for partners of ESCAPE (the European Science Cluster of Astronomy and Particle physics ESFRI research infrastructures – an open collaboration that brings together Europe's major particle physics and astronomy facilities around a shared open-science data infrastructure, in which Rucio serves as the backbone of the common "Data Lake") and other communities evaluating the CERN scientific computing stack. It is, in a sense, the logical next step for a project that started as a single-experiment tool: making the same exabyte-grade technology available to smaller CERN experiments.

EU-funded projects extending Rucio

Three EU-funded projects in which CERN is involved are currently pushing Rucio into new territory. DaFab applies and extends Rucio to Earth Observation, using Copernicus satellite data together with AI and HPC; it has driven the introduction of structured JSON metadata, schema governance and a composable filtering language, turning metadata from a supporting detail into a first-class design axis of Rucio. RI-SCALE (Unlocking Research Infrastructure potential with Scalable AI and Data) builds on Rucio to create scalable Data Exploitation Platforms for European Research Infrastructures, co-hosting their data with AI frameworks and computing resources to accelerate work in environmental and life sciences. The OSCARS Rucio Open Data project is adding native open-data support to Rucio: rather than copying data into separate FAIR-compliant systems, communities will be able to define embargo and public-access policies directly on their existing Rucio-managed collections, reducing duplication and operational cost. Taken together, these projects show how Rucio is increasingly becoming an infrastructure component of European Open Science, well beyond its high-energy physics roots.

Next stop: Lyon, October 2026

The Rucio Community Workshop has itself become a recurrent gathering event of the community's calendar, and its recent venues trace the project's expanding reach: after the joint DIRAC & Rucio workshop in Japan in 2023, the 7th edition was hosted by the San Diego Supercomputer Center in the United States in 2024, and the 8th by the SKA Observatory at its Global Headquarters at Jodrell Bank in 2025. In 2026 the workshop returns to continental Europe: the 9th Rucio Community Workshop will be hosted by the IN2P3 Computing Centre (CC-IN2P3) in Lyon from 26 to 30 October 2026. As a long-standing WLCG Tier-1 site for ATLAS, CMS and LHCb, CC-IN2P3 is a fitting venue: a place where the abstractions of Rucio meet tape robots, disk pools and transatlantic links every day. The programme will cover the state of the project, deployment evolution, token-based authentication, HPC and cloud integration, AI, the Rucio-as-a-Service offering, and the role of Rucio in open data and FAIR workflows, alongside hands-on tutorials and the now-traditional hackathon. The workshop is open to anyone interested in scientific data management.

RUCIO 2026 Poster

Further reading 

Rucio website: https://rucio.cern 

Rucio Community Workshop 2026, Lyon: https://rucio.cern/2026