U.S. Department of Commerce

10/26/2021 | News release | Distributed by Public on 10/26/2021 15:26

Keeping Your Data Safe: The Differential Privacy Temporal Map Challenge


Data is critical in modern society - it aids in decision-making for everything from fighting wildfires to solving homelessness and even promoting fitness. However, managing private data has become a serious security risk for individuals, companies, and governments. In recognition of Cybersecurity Awareness Month, the Department of Commerce (DOC) highlights how the National Institute of Standards and Technology (NIST) uses prize challenges to improve technologies that keep data private. These prize challenge technologies successfully enable public safety agencies to share data without compromising the privacy of individuals.

Data for Public Good

NIST's Public Safety Communications Research (PSCR) Division conducts research for public safety agencies that are huge data generators. These agencies collect millions of records every day as they take calls for service, make arrests, respond to incidents, and take medical information from emergency response teams.

With open access to this data, agencies, researchers, and the public can analyze it for insights to generate better policy. This could have far-reaching impacts, including optimizing response time and personnel placement, natural disaster response, epidemic tracking, demographic data, and civic planning.

However, those records can also contain sensitive material known as personally identifiable information (PII), which can reveal someone's identity. While these are public records, data collected by public safety agencies cannot be released without proper anonymization. But old techniques like standard data redaction are no longer enough to protect users' privacy. Imagine you gave your PII to a bank and a social media platform, and both got hacked. The bank redacted your home address, but the hackers still got your name and email address. With that, they identify your PII from the social media hack, which also reveals your phone number and ZIP code. Then the hackers cross-reference these separate data breaches with publicly available information, such as voter records. In just a few minutes, your identity is vulnerable. In fact, 87% of all Americans can be identified with only three pieces of information: their ZIP code, birthday, and gender.

The Case for Differential Privacy

One way of protecting PII in datasets is using differential privacy, a relatively new mathematical model that limits the amount of information that can be learned about a person. Differential privacy works by adding carefully tailored "noise" to alter identifiable information within whole datasets, such as ages or addresses. Altering data with this "noise" is like adding static noise on a radio broadcast. With a little static, you miss a few words, but you still get the gist of what is being said; too much static and the broadcast is unintelligible. The challenge of differential privacy is tuning your mathematical models so the results come out just right, with a balanced level of privacy and utility.

NIST's De-ID Challenges

NIST uses prize challenge competitions to crowdsource the best and brightest innovators across a multitude of fields. The latest 2020 Differential Privacy Temporal Map Challenge builds on the results from two 2018 differential privacy challenges to broaden their capabilities. Rather than focusing on datasets with single-use cases, this new challenge looked at temporal map data that tracks data across time.

"The reason to use a prize challenge for a differential privacy competition is because many diverse techniques can be explored simultaneously. Different solutions have different characteristics, so there is not necessarily a 'best solution' to this problem - it depends on the use case," says NIST prize challenge specialist Gary Howarth. "At the end of the day, you want a diversity of solutions."

The 2020 Differential Privacy Temporal Map Challenge featured 19 entries and 92 participants completing three separate coding sprints using differential privacy methods on temporal map data. Sprint 1 featured data on 911 calls in Baltimore made during the course of a year. Sprint 2 featured census data about simulated individuals in various U.S. states from 2012 to 2018. And Sprint 3 featured records of millions of taxi trips in Chicago. The goal was to create a privacy-preserving dashboard map for each sprint that showed changes across different map segments over time.

Eight teams qualified for the final scoring in Sprint 3, and six achieved submissions that were validated as differentially private. First place went to N-CRiPT of the National University of Singapore, second place went to the Minutemen of UMass Amherst, and third place went to DPSyn of Purdue University. N-CRiPT also won Sprint 2, and the Minutemen won Sprint 1.

The Solution

The 2020 Differential Privacy Temporal Map Challenge demonstrated that using differentially private synthetic data is an effective way to maintain both data's privacy and utility. Synthetic data is created when data based on real individuals is put through a synthetic data generator (to add "noise") and sanitized. When synthetic data is being analyzed to form judgments, no real person in the data is at risk of being identified, and the dataset maintains the same overarching trends.

The winning team, N-CRiPT, used Markov random fields to generate synthetic data and had great success with this method. Other solutions from competing teams included probabilistic graphical models, rescaling public data, and using weighting, clipping and archetypes. The best of these solutions demonstrated better performance on both privacy and accuracy than traditional techniques.

The final phase of the 2020 Differential Privacy Temporal Map Challenge occurred in spring 2021. Qualifying teams had the opportunity to advance the quality, generalizability and openness of their solutions. Following the final algorithm sprint, teams whose algorithms had been validated as differentially private were invited to make their source code open-source with an appropriate license.

What's Next for Differential Privacy

These prize challenges and the innovative solutions developed by competitors are helping the DOC enhance the nation's cybersecurity to protect Americans' privacy, maintain public safety, and support our economic and national security. The strides made for differential privacy in the 2020 Differential Privacy Temporal Map Challenge will make it possible to utilize the vast troves of data held by public safety agencies without risking individual privacy. Because these datasets can now satisfy differential privacy, they can be used by departments to make better decisions that could have life-saving impacts.

NIST is currently organizing all of the differential privacy algorithms and metrics into an open-source repository for public use. Four of the 2020 Differential Privacy Temporal Map Challenge teams' open-sourced code is currently available on the NIST website. Additionally, the 2018 Differential Privacy Synthetic Data Challenge winners' open-source code is available on the NIST website. To stay up to date on NIST's PSCR Division and the future of differential privacy, make sure you subscribe to the PSCR newsletter.