Page one of Jacob Shallus' officially engrossed copy of the U.S. Constitution signed in Philadelphia by delegates of the Constitutional Convention in 1787. (image: Wikipedia)

A century ago, the section of U.S. federal law governing public health and welfare was relatively small and loosely connected to the rest of the legal system. Today, it is one of the largest and most interconnected parts of the United States Code.

That shift is one of many patterns revealed by a new dataset published in the journal Scientific Data, which reconstructs the U.S. Code — the official compilation of federal statutory law — from 1926 to 2023.

Developed by researchers at the Santa Fe Institute and collaborators, the dataset offers the most comprehensive, data-ready picture yet of how federal law has grown, reorganized, and become more interconnected over the past century.

“The U.S. Code is more than a set of rules,” says Hyejin Youn, an SFI External Professor based at Seoul National University and senior author on the study. “It’s a record of what society has decided is important enough to regulate — and how those priorities evolve as society becomes more complex.”

For James Holehouse, an SFI Postdoctoral Fellow and co-lead author, the code stood out as a uniquely rich case. “Many countries don’t have a single, codified body of law like this,” he said. “The U.S. does, and it’s about a hundred years old. That gives us a rare opportunity to study how one of the world’s largest regulatory systems has changed.”

Making the study possible meant first rebuilding the past. Many early versions of the U.S. Code exist only as scanned pages, riddled with errors from early text-scanning software. Co-lead author Dawoon Jeong, a computational social scientist at the Knowledge Lab, University of Chicago, led the effort to clean and reconstruct those records using artificial intelligence in a carefully controlled process.

“Before this, we didn’t really have usable data for studying legal change over such a long period,” Jeong said. “Once we could reliably recover the old texts, we could finally look at how the legal system evolved as a whole.”

The resulting dataset captures legal complexity on three levels: textual change, including word counts and vocabulary growth; hierarchical structure, showing how titles branch into chapters and sections; and networks of cross-references linking different areas of law.

Taken together, the data reveal how U.S. law shifts in response to events and priorities. The growing role of public health, from disease control to food safety, and the creation of Title 6 (Domestic Security) after 9/11 show how major events reshape the legal system.

The team emphasizes that the dataset is a foundation rather than a conclusion. Future work will focus on modeling why some legal domains change quickly, how growing interdependence affects adaptation, and whether legal complexity can keep pace with societal complexity.

“As rules grow and institutions come under pressure,” Holehouse says, “it’s increasingly important to understand how these systems evolve before we try to change them.”

This material is based upon work supported by the U.S. National Science Foundation under Award No. 2526746.

Read the study, "A Dataset Showing a Century of Evolution in the Complexity of the United States Legal Code" in Scientific Data (January 6, 2026). DOI: 10.1038/s41597-025-06313-w