Synthetic Data Anonymity
Assessing the residual privacy risks of synthetic data
In the field of medical research, and in relation to knowledge development more broadly, machine learning models can now be trained to generate synthetic (fictitious) patient profiles using real patient data. How can we characterize the residual privacy risks of these synthetic profiles?

LATECE domain
Connected health
LATECE values
- Social conscience
- Public service
- Knowledge development
Nature of the project
Designing a model to assess the privacy risk of synthetic data
The challenge
How can we characterize the residual privacy risks of machine-generated synthetic data?
The solutions
As part of an international competition, we participated in a membership inference attack (MIA) exercise aimed at finding and extracting private data from synthetic set generated from real personal data.
The UQAM-CHUM team demonstrated that certain safeguards are needed to better protect personal data used by generative models.
Development of a library of synthetic data creation models that ensure differential privacy and enable MIA exercises to evaluate the level of residual risk in synthetic data.
Target audiences
- General public
- Healthcare system
- Medical and IA research
Start up
2023
End date
2026
Leader
LATECE student contributor
- Hadrien Lautraite
Collaborators
Jean-François Rajotte, Lorrie Herbault, Yue Qi, Michaël Chassé, CHUM
Keywords
#Connected Health #AI #PersonaData #SyntheticData #Privacy
Partners and Funders
- FRQS
Total funding
$30,000
Publications
CLOVER: A framework for benchmarking synthetic data generation methods balancing utility and privacy in healthcare. Yue Qi, Lorrie Herbault, Hadrien Lautraite, Michael Yu, Katleen Blanchet, Christian Vincelette, Louis Mullie, Guillaume Dumas, Jean-François Rajotte, Kamran Afzali, Sébastien Gambs, Michaël Chassé. Artificial Intelligence in the Life Sciences, 2026
Media coverage

