Speranza New System: Ensuring Security and Maintaining Anonymity for Open Source Software Maintainers

2023-12-04

Open source software - software that is distributed for free, along with its source code, for easy copying, addition, or modification - is "ubiquitous," as quoted in the 2023 Open Source Security and Risk Analysis Report. 96% of computer programs used in major industries contain open source software, with 76% of programs being composed of open source software. However, the report warns that the percentage of software packages with security vulnerabilities is still concerning.

One concern is that "the software you receive from a trusted developer may have been compromised in some way," said Kelsey Merrill, a software engineer who recently obtained a master's degree from MIT's Department of Electrical Engineering and Computer Science. "Assume that somewhere in the supply chain, the software has been altered by an attacker with malicious intent."

This type of security vulnerability is not abstract. As an infamous example from 2020, the Texas-based company SolarWinds released a software update for its widely used program called Orion. Hackers infiltrated the system and inserted malicious code into the software before SolarWinds sent the latest version of Orion to over 18,000 customers, including Microsoft, Intel, and about 100 other companies, as well as several US government agencies such as the State Department, Department of Defense, Department of the Treasury, Department of Commerce, and Department of Homeland Security.

In this case, the compromised product came from a large commercial company, but Merrill states that the possibility of mistakes occurring is even greater in the open source domain, where software is published by individuals from different backgrounds, many of whom lack any security training, and is used globally.

She, along with her three collaborators - her former advisor, Karen Sollins, Chief Scientist at MIT's Computer Science and Artificial Intelligence Laboratory; Santiago Torres-Arias, Assistant Professor of Computer Science at Purdue University; and Zachary Newman, former MIT graduate and current Research Scientist at Chainguard Labs - have developed a new system called Speranza. The purpose of Speranza is to assure software consumers that the products they receive have not been tampered with and come directly from trusted sources. Their paper has been published on the arXiv preprint server.

"What we have done," explained Sollins, "is develop, prove correct, and demonstrate the feasibility of a method that allows [software] maintainers to remain anonymous." Maintaining anonymity is crucial, considering that almost everyone, including software developers, values their privacy. Sollins added that this new method "also allows [software] users to trust that the maintainers are indeed legitimate maintainers and that the code being downloaded is indeed the correct code from that maintainer."

So, how do users verify the authenticity of a software package to ensure that the maintainers are who they claim to be, as Merrill stated, "the maintainers are the people they say they are"? The traditional way of doing this, invented over 40 years ago, is through digital signatures, which are similar to handwritten signatures but with greater built-in security through the use of various encryption techniques.

To perform a digital signature, two "keys" are generated - each key is a number composed of zeros and ones, with a length of 256 bits. One key is designated as "private," and the other is designated as "public," but they form a mathematically related pair.

Software developers can use their private key, along with the content of a document or computer program, to generate a digital signature uniquely associated with that document or program. Then, software users can use the public key, along with the developer's signature and the content of the package they downloaded, to verify the authenticity of the package.

The result of the verification is presented as a yes or no, 1 or 0. "Getting a 1 means that authenticity has been ensured. The document matches the signature, so nothing has been altered. Getting a 0 means that something is wrong, and you may not want to rely on that document," explained Merrill.

Although this decades-old method has been tried and tested to some extent, it is far from perfect. One issue pointed out by Merrill is that "people are not good at securely managing encryption keys, which are long numbers, and preventing them from being lost." People often lose passwords. "If a software developer loses their private key and then contacts users saying, 'Hey, I have a new key,' how can you know that it's really them?"

To address these concerns, Speranza is built upon "Sigstore" - a system launched last year to enhance software supply chain security. Sigstore was developed by Zachary Newman and Santiago Torres-Arias, who initiated the Speranza project, along with John Speed Meyers from Chainguard Labs. Sigstore automates and simplifies the digital signature process. Users no longer need to manage long encryption keys; instead, they are issued temporary keys (a method called "keyless signatures") that expire quickly, possibly within minutes, eliminating the need for storage.

One drawback of Sigstore is that it discards persistent public keys, requiring software maintainers to identify themselves through an OpenID Connect (OIDC) protocol, which can be associated with their email addresses. This feature alone could hinder widespread adoption of Sigstore and has motivated the existence and purpose of Speranza. "We took over the infrastructure of Sigstore and modified it to provide privacy guarantees," said Merrill.

Speranza achieves privacy protection through an original concept called "identity co-commitment." Here is a simple explanation of how this concept works: The identity of a software developer is transformed into a "commitment" in the form of an email address, composed of a large pseudorandom number (pseudorandom numbers do not meet the technical definition of "random," but in practice, they are almost as good as random). Simultaneously, another large pseudorandom number - the accompanying commitment (or co-commitment) - is generated and associated with the software package for which the developer has created or obtained modification privileges.

To show potential users of a specific software package who created and signed this version of the package, the authorized developer will release a proof that establishes a clear connection between the commitment representing their identity and the commitment associated with the software product. The proof is a special type called a zero-knowledge proof, which demonstrates a way to show that two things have a common boundary without revealing the detailed contents of those things, such as the developer's email address.

"Speranza ensures that software comes from the right source without requiring developers to disclose personal information, such as their email addresses," commented Marina Moore, a doctoral candidate at the New York University Center for Cybersecurity. "It allows verifiers to see that the same developer has signed a package multiple times without revealing who the developer is or even disclosing other packages they are responsible for. This provides better usability improvements than long-term signing keys and more privacy benefits than other OIDC-based solutions like Sigstore."

Marcela Mellara, a research scientist at Intel Labs' Security and Privacy Research Group, agrees. "The advantage of this approach is that it allows software consumers to automatically verify if the packages they obtain from a repository enabled by Speranza come from the expected maintainers and that the software they use is genuinely trusted."