Synthetic Data Anonymity

Assessing the residual privacy risks of synthetic data

In the field of medical research, and in relation to knowledge development more broadly, machine learning models can now be trained to generate synthetic (fictitious) patient profiles using real patient data. How can we characterize the residual privacy risks of these synthetic profiles?

LATECE domain

Connected health

LATECE values

  • Social conscience
  • Public service
  • Knowledge development

Nature of the project

Designing a model to assess the privacy risk of synthetic data

How can we characterize the residual privacy risks of machine-generated synthetic data?

As part of an international competition, we participated in a membership inference attack (MIA) exercise aimed at finding and extracting private data from synthetic set generated from real personal data.

The UQAM-CHUM team demonstrated that certain safeguards are needed to better protect personal data used by generative models.

Development of a library of synthetic data creation models that ensure differential privacy and enable MIA exercises to evaluate the level of residual risk in synthetic data.

Target audiences

  • General public
  • Healthcare system
  • Medical and IA research

Start up

2023

End date

2026

Leader

Sebastien Gambs, PhD

LATECE student contributor

  • Hadrien Lautraite

Collaborators

Jean-François Rajotte, Lorrie Herbault, Yue Qi, Michaël Chassé, CHUM

Keywords

#Connected Health #AI #PersonaData #SyntheticData #Privacy

Partners and Funders

  • FRQS

Total funding

Publications

CLOVER: A framework for benchmarking synthetic data generation methods balancing utility and privacy in healthcare. Yue Qi, Lorrie Herbault, Hadrien Lautraite, Michael Yu, Katleen Blanchet, Christian Vincelette, Louis Mullie, Guillaume Dumas, Jean-François Rajotte, Kamran Afzali, Sébastien Gambs, Michaël Chassé. Artificial Intelligence in the Life Sciences, 2026

Media coverage