Exploring Techniques for Synthetic Mobility Data Generation to Protect Privacy
Research Team: Chen-Nee Chuah (lead), Michael Zhang, Samson Cheung, Michael Zhang, Ammar Haydari, and Dongjie Chen
UC Campus(es): UC Davis
Problem Statement: The widespread adoption of location-based services and smart phones and watches devices has made it possible to monitor where people go. This mobility data can be used in a variety of ways, such as smart urban planning and enhancing driver safety in connected and automated vehicles (CAVs). Multiple platforms exist to exchange mobility data, including the Open Mobility Foundation and Mobility Dataspace. However, there are significant privacy concerns associated with this data. Data on people’s movements can inadvertently reveal personal lifestyle patterns, such as home and office locations and frequented points of interest. The simple removal of personal identifiers from the dataset does not sufficiently protect privacy, as attackers can still re-identify users. Thus, privacy concerns tend to deter the free sharing of mobility and CAV data.
Project Description: The proposed project focuses on synthetic mobility data generation and addressing two challenges related to privacy concerns and the lack of publicly available mobility/CAVs data. One way to deal with data privacy is to generate synthetic data that is similar to the raw data but scrubbed of personally identifiable information. While there are several techniques to preserve privacy for aggregated mobility datasets, producing synthetic mobility datasets at the individual level is still challenging. Specifically, it’s hard to generate realistic mobility trajectories that preserve privacy (for example, do not give clues about home and work locations), demonstrate realistic mobility patterns, and follow real‐world conditions and traffic rules. In this project, the researchers will use state-of-the-art technology and techniques to sanitize raw datasets and generate synthetic mobility data that retain information about people’s movements and driving behavior. The project hopes to provide a foundational building block for a future data exchange platform that will preserve privacy and enable the sharing of mobility datasets.
Status: In Progress
Budget: $100,000