Data Scientist / Epidemiologist

Harvard Data Science Initiative Trust in Science / OpenDP Project

OpenDP is a suite of open-source software tools tools for privacy-protective statistical analysis of sensitive personal data. The OpenDP project at Harvard began in partnership with Microsoft developing a differentially private data curator application. Building on this collaboration, our teams at IQSS and SEAS are now building a broader community around OpenDP with stakeholders and contributors from across academia, industry, and government to design, implement, and govern an “OpenDP Commons” that includes a library of differentially private algorithms and other general-purpose tools for use in end-to-end differential privacy systems.

While dealing with human mobility data used by CrisisReady partners, differential privacy is implemented on a per-project, per-dataset basis that is error-prone and potentially imperfect. The OpenDP project anchored at Harvard’s Institute of Quantitative Social Sciences is designed to simplify the end-use of differential privacy by providing a software library with bindings to common languages that can be integrated with statistical and machine learning systems. The application of OpenDP to these datasets is likely to be a critical privacy preserving developing that can catalyze the safer use of these datasets by a larger community of scientists globally.

JOB DESCRIPTION

We seek a graduate or post-doctoral candidate that will develop and integrate differential privacy algorithms into the OpenDP library and collaborate with library users to understand the developer experience and long-term scope and efficacy of the application of the algorithms against static data sets produced daily at the US census tract level. This will be an example for more expansive applications. Eligible candidates may be trained in computer science / data science / statistics or epidemiology; preference will be given to candidates with interdisciplinary work experience. Proficiency with statistical software is necessary (Python or Rust preferred).

The candidate will work with an interdisciplinary team comprising Professors Gary King, Salil Vadhan, Merce Crosas, Caroline Buckee, Satchit Balsari, and others at Harvard. The candidate will serve as HDSI Trust in Science Fellow with opportunities to participate in the academic life of several collaborating centers at Harvard.

The candidate will:

  • Support the integration of the OpenDP algorithms into the Camber Systems data pipeline, implemented in PySpark running on AWS.
  • Integrate the OpenDP algorithms into the Dataverse ecosystem
  • Validate the resulting data sets against other, external data sets (e.g., Facebook Data for Good)

    Please direct inquiries to jobs@crisisready.io