Project Summary
Policy development is driven by information, some of which involves people and organizations. Some of this information includes personally identifiable information (PII) and other sensitive information. Typically, sensitive information cannot be released, so if the data is to be released to justify a policy or the manner in which a policy is to be implemented, the data must be sanitized to remove any sensitive data points. This must be done in a way to preserve the utility of the data, so as to ensure that the policies and practices can be justified. Improving transportation infrastructure requires gathering data to be used in developing, or validating, transportation policies and practices. Sometimes this data must be made available to third parties to carry out analyses or to independently confirm the claims made about policy elements and the way policies are put into practice. Data can be released in two forms: raw data, which is the data as it is gathered, and aggregated data, in which summaries are released. Data is often “sanitized,” or anonymized, to prevent the release of sensitive information. Inferences drawn from the anonymized or aggregated data must be the same as would be drawn from unanonymized data.The goal of this project is to identify and characterize precisely the gaps in the existing research surrounding data sanitization in order to preserve the balance between utility and privacy. In order to identify research gaps that create uncertainty, the researcher will meet with a variety of parties, including policy makers, data analysts, and privacy officers, to answer the following questions: What are the policy requirements for data analysis? How does policy drive the analysis of the data? What parts of the data can safely be released? What parts cannot be released? If the data is aggregated, how does the aggregation work? The researcher will then examine literature on data sanitization, reversing the sanitization, and the theory underlying data sanitization which, when combined with the information gathered previously, will be used to identify specific gaps in the research.