Presentation at SER 2023

Abstract A shift in data sharing paradigms: a case study on the ways in which big data and complex algorithms allow for increased data sharing while preserving privacy

Lauren B Wilner, Weipeng Zhou, Amy Youngbloom, Stephen J Mooney

Personal device data including that from GPS trackers and accelerometers hold considerable public health research potential. These data can be used to measure walking and explore determinants of this key cardioprotective behavior across geospatial contexts independent of interviewer, recall, and social desirability biases.

However, personal monitoring devices are expensive, so studies using these devices rarely enroll large populations. Analyses pooling monitoring data from multiple sites offer broader conclusions, but these data are personally identifying, precluding sharing the data directly. To address this gap, we developed an R package (‘walkboutr’) to extract patterns consistent with walking as deidentified and sharable ‘walk bouts’.

A walk bout is a period of activity with accelerometer movement matching the patterns of walking with corresponding GPS measurements that confirm travel. The inputs of the walkboutr package are individual-level accelerometry and GPS data. The outputs of the model are walk bouts with corresponding times, duration, and summary statistics on the sample population, which collapse all personally identifying information. These bouts can be used to measure walking both as an outcome of a change to the built environment or as a predictor of health outcomes such as a cardioprotective behavior.

In this case study, we collapsed data from Wave 1 of the Travel Assessment and Community (TRAC) study containing 7924 days of GPS and accelerometry records from 670 participants. The result was a dataset of 4898 deidentified walk bouts that were used to assess the impact of a new transit stop on physical activity and were safely shared with other research groups investigating similar predictor-outcome relationships.

Our case study illustrates how reusable code can develop useful and non-identifying summaries of identifying data. Such code can be used to develop larger cross-site datasets to address nuanced scientific questions without privacy concerns.

Download Slides Here

walkboutr documentation

CRAN page