Skip to content
AdminJul 19, 20223 min read

Replica Analytics’ new estimator uses synthetic data for more reliable evaluation of re-identification risk in datasets

OTTAWA, July 19, 2022 — Replica Analytics, an Aetion company, has unveiled a novel approach to use synthetic data for a more accurate assessment of re-identification risks in datasets, to better manage privacy risks and enable greater data sharing in healthcare and other sectors.
“Re-identification risk is the probability that an adversary will correctly match a record in a dataset with a real person and until now, there has been no sufficiently reliable measure of this risk,” said Dr. Khaled El Emam, Senior Vice-President and General Manager of Replica Analytics, the premier science-based synthetic data generation technology provider to the healthcare industry. “Access to data and sharing de-identified datasets remain a challenge, in part due to privacy concerns. The re-identification risk estimator we have developed should help data custodians overcome those challenges.”

Most existing estimators provide a proxy for risk based on strong assumptions, as they cannot calculate the risk on a population because real population data is rarely available. Replica’s estimator leverages data synthesis technology to simulate the unavailable population dataset, so that re-identification risks can be calculated much more accurately. Synthetic data generation (SDG) involves training a machine learning model to master the statistical patterns and properties of a real dataset. The trained model, when implemented properly, is then used to create a synthetic dataset which maintains the traits of the original dataset, but with no one-to-one mapping back to a person, so the synthetic data mitigates privacy risks.

Measuring re-identification risk using a synthetic estimator to enable data sharing, a study recently published by the journal, PLOS ONE, includes a detailed analysis of the concepts behind Replica’s new risk estimator, an evaluation of its performance and relevant case studies. The results show that the estimator reliably outperforms other approaches, across different dataset sizes and varying complexity, achieving a high degree of accuracy, and offering a consistent estimate of the probability of re-identification risk. The study was also the focus of a webinar and blog post.

The new approach is another example of the usefulness and effectiveness of SDG technology in assessing and mitigating privacy risks and enabling data sharing. Replica’s estimator can now be used through the Replica Synthesis software to better assess re-identification risks in real datasets. If the risk is deemed too high, organizations can choose to synthesize the data and then use the company’s privacy assurance functionality to measure any risk in the synthetic data to demonstrate that it is much lower than the real data.

About Replica Analytics, an Aetion company

Replica Analytics is the premier science-based SDG technology provider to the healthcare industry. The company is a pioneer in the development of unique technologies for generating privacy-protective synthetic data that maintain the statistical properties of real-world data (RWD). The company was acquired in late 2021 by Aetion, the leading regulatory-grade real-world evidence (RWE) technology provider. Replica Synthesis software provides a full suite of synthetic data generation and evaluation capabilities that can solve multiple grand challenges facing the life sciences industry, and health research in general. Learn more.

About Aetion

Aetion is a healthcare analytics company that delivers real-world evidence for the manufacturers, purchasers, and regulators of medical treatments and technologies. The Aetion Evidence Platform ® analyzes data from the real world to produce transparent, rapid, and scientifically validated answers on safety, effectiveness, and value. Founded by Harvard Medical School faculty members with decades of experience in epidemiology and health outcomes research, Aetion informs healthcare’s most critical decisions—what works best, for whom, and when—to guide product development, commercialization, and payment innovation. Learn more at and follow us at @aetioninc.