Lucas Lange | publications

2025

Slice It up: Unmasking User Identities in Smartwatch Health Data

Lucas Lange, Tobias Schreieder, Victor Christen, and Erhard Rahm

20th ACM ASIA Conference on Computer and Communications Security (AsiaCCS 2025) [accepted] (Aug. 2025)

Abs Bib Pub PDF Code

Wearables are widely used for health data collection due to their availability and advanced sensors, enabling smart health applications like stress detection. However, the sensitivity of personal health data raises significant privacy concerns. While user de-identification by removing direct identifiers such as names and addresses is commonly employed to protect privacy, the data itself can still be exploited to re-identify individuals. We introduce a novel framework for similarity-based Dynamic Time Warping (DTW) re-identification attacks on time series health data. Using the WESAD dataset and two larger synthetic datasets, we demonstrate that even short segments of sensor data can achieve perfect re-identification with our Slicing-DTW-Attack. Our attack is independent of training data and computes similarity rankings in about 2 minutes for 10,000 subjects on a single CPU core. These findings highlight that de-identification alone is insufficient to protect privacy. As a defense, we show that adding random noise to the signals significantly reduces re-identification risk while only moderately affecting usability in stress detection tasks, offering a promising approach to balancing privacy and utility.
@inproceedings{langeSliceItUnmasking2025, title = {Slice It up: {{Unmasking User Identities}} in {{Smartwatch Health Data}}}, author = {Lange, Lucas and Schreieder, Tobias and Christen, Victor and Rahm, Erhard}, year = {2025}, month = aug, booktitle = {20th ACM ASIA Conference on Computer and Communications Security (AsiaCCS 2025) [accepted]}, doi = {10.48550/arXiv.2308.08310}, url = {http://arxiv.org/abs/2308.08310}, urldate = {2023-10-17}, code = {https://github.com/tobiasschreieder/dtw-attacks}, pdf = {https://arxiv.org/pdf/2308.08310.pdf}, bibtexshow = {false}, selected = {true} }
Federated Learning With Individualized Privacy Through Client Sampling

Lucas Lange, Ole Borchardt, and Erhard Rahm

10th International Conference on Machine Learning Technologies (ICMLT 2025) [accepted] (May. 2025)

Abs Bib Pub PDF Code

With growing concerns about user data collection, individualized privacy has emerged as a promising solution to balance protection and utility by accounting for diverse user privacy preferences. Instead of enforcing a uniform level of anonymization for all users, this approach allows individuals to choose privacy settings that align with their comfort levels. Building on this idea, we propose an adapted method for enabling Individualized Differential Privacy (IDP) in Federated Learning (FL) by handling clients according to their personal privacy preferences. By extending the SAMPLE algorithm from centralized settings to FL, we calculate client-specific sampling rates based on their heterogeneous privacy budgets and integrate them into a modified IDP-FedAvg algorithm. We test this method under realistic privacy distributions and multiple datasets. The experimental results demonstrate that our approach achieves clear improvements over uniform DP baselines, reducing the trade-off between privacy and utility. Compared to the alternative SCALE method in related work, which assigns differing noise scales to clients, our method performs notably better. However, challenges remain for complex tasks with non-i.i.d. data, primarily stemming from the constraints of the decentralized setting.
@inproceedings{langeFederatedLearningIndividualized2025, title = {Federated {{Learning With Individualized Privacy Through Client Sampling}}}, author = {Lange, Lucas and Borchardt, Ole and Rahm, Erhard}, year = {2025}, month = may, booktitle = {10th International Conference on Machine Learning Technologies (ICMLT 2025) [accepted]}, doi = {10.48550/arXiv.2501.17634}, url = {http://arxiv.org/abs/2501.17634}, urldate = {2025-01-30}, keywords = {Computer Science - Computer Vision and Pattern Recognition,Computer Science - Cryptography and Security,Computer Science - Machine Learning}, code = {https://github.com/luckyos-code/flidp}, pdf = {http://arxiv.org/pdf/2501.17634.pdf}, bibtexshow = {false}, selected = {true} }
Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning

Lucas Lange, Maurice-Maximilian Heykeroth, and Erhard Rahm

21st Conference on Database Systems for Business, Technology and Web (BTW 2025) (May. 2025)

Abs Bib Pub PDF Code

Machine Learning (ML) is crucial in many sectors, including computer vision. However, ML models trained on sensitive data face security challenges, as they can be attacked and leak information. Privacy-Preserving Machine Learning (PPML) addresses this by using Differential Privacy (DP) to balance utility and privacy. This study identifies image dataset characteristics that affect the utility and vulnerability of private and non-private Convolutional Neural Network (CNN) models. Through analyzing multiple datasets and privacy budgets, we find that imbalanced datasets increase vulnerability in minority classes, but DP mitigates this issue. Datasets with fewer classes improve both model utility and privacy, while high entropy or low Fisher Discriminant Ratio (FDR) datasets deteriorate the utility-privacy trade-off. These insights offer valuable guidance for practitioners and researchers in estimating and optimizing the utility-privacy trade-off in image datasets, helping to inform data and privacy modifications for better outcomes based on dataset characteristics.
@inproceedings{langeAssessingImpactImage2025, title = {Assessing the {{Impact}} of {{Image Dataset Features}} on {{Privacy-Preserving Machine Learning}}}, booktitle = {21st {{Conference}} on {{Database Systems}} for {{Business}}, {{Technology}} and {{Web}} ({{BTW}} 2025)}, author = {Lange, Lucas and Heykeroth, Maurice-Maximilian and Rahm, Erhard}, year = {2025}, pages = {589--612}, publisher = {Gesellschaft f{\"u}r Informatik, Bonn}, url = {https://dl.gi.de/handle/20.500.12116/45890}, urldate = {2025-03-06}, langid = {english}, code = {https://github.com/luckyos-code/dataset-analysis-ppml}, pdf = {https://dl.gi.de/server/api/core/bitstreams/cc5101ff-d3e8-4c5b-a466-896180a8f24f/content}, bibtexshow = {true}, selected = {true} }

2024

Property Inference as a Regression Problem: Attacks and Defense

Joshua Stock, Lucas Lange, Erhard Rahm, and Hannes Federrath

21th International Conference on Security and Cryptography (SECRYPT 2024) (Jul. 2024)

Abs Bib Pub PDF Code

In contrast to privacy attacks focussing on individuals in a training dataset (e.g., membership inference), Property Inference Attacks (PIAs) are aimed at extracting population-level properties from trained Machine Learning (ML) models. These sensitive properties are often based on ratios, such as the ratio of male to female records in a dataset. If a company has trained an ML model on customer data, a PIA could for example reveal the demographics of their customer base to a competitor, compromising a potential trade secret. For ratio-based properties, inferring over a continuous range using regression is more natural than classification. We therefore extend previous white-box and black-box attacks by modelling property inference as a regression problem. For the black-box attack we further reduce prior assumptions by using an arbitrary attack dataset, independent from a target model’s training data. We conduct experiments on three datasets for both white-box and black-box scenarios, indicating promising adversary performances in each scenario with a test R^2 between 0.6 and 0.86. We then present a new defense mechanism based on adversarial training that successfully inhibits our black-box attacks. This mechanism proves to be effective in reducing the adversary’s R^2 from 0.63 to 0.07 and induces practically no utility loss, with the accuracy of target models dropping by no more than 0.2 percentage points.
@inproceedings{stockPropertyInferenceRegression2024, title = {Property {{Inference}} as a {{Regression Problem}}: {{Attacks}} and {{Defense}}}, shorttitle = {Property {{Inference}} as a {{Regression Problem}}}, booktitle = {21th {{International Conference}} on {{Security}} and {{Cryptography}} ({{SECRYPT}} 2024)}, author = {Stock, Joshua and Lange, Lucas and Rahm, Erhard and Federrath, Hannes}, year = {2024}, month = jul, pages = {876--885}, publisher = {SciTePress}, doi = {10.5220/0012863800003767}, url = {https://www.scitepress.org/PublicationsDetail.aspx?ID=VOY5GiNn9B4=&t=1}, urldate = {2024-08-06}, code = {https://github.com/joshua-stock/bb-pia}, isbn = {978-989-758-709-2}, pdf = {https://dbs.uni-leipzig.de/files/research/publications/proc\_paper.pdf}, bibtexshow = {true}, selected = {false} }
Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection

Lucas Lange, Nils Wenzlitschke, and Erhard Rahm

Sensors (May. 2024)

Abs Bib Pub PDF Code

Smartwatch health sensor data are increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprise sensitive personal information and are resource-intensive to acquire for research purposes. In response to this challenge, we introduce the privacy-aware synthetization of multi-sensor smartwatch health readings related to moments of stress, employing Generative Adversarial Networks (GANs) and Differential Privacy (DP) safeguards. Our method not only protects patient information but also enhances data availability for research. To ensure its usefulness, we test synthetic data from multiple GANs and employ different data enhancement strategies on an actual stress detection task. Our GAN-based augmentation methods demonstrate significant improvements in model performance, with private DP training scenarios observing an 11.90–15.48% increase in F1-score, while non-private training scenarios still see a 0.45% boost. These results underline the potential of differentially private synthetic data in optimizing utility–privacy trade-offs, especially with the limited availability of real training samples. Through rigorous quality assessments, we confirm the integrity and plausibility of our synthetic data, which, however, are significantly impacted when increasing privacy requirements.
@article{langeGeneratingSyntheticHealth2024a, title = {Generating {{Synthetic Health Sensor Data}} for {{Privacy-Preserving Wearable Stress Detection}}}, author = {Lange, Lucas and Wenzlitschke, Nils and Rahm, Erhard}, year = {2024}, month = may, journal = {Sensors}, volume = {24}, number = {10}, pages = {3052}, publisher = {{Multidisciplinary Digital Publishing Institute}}, issn = {1424-8220}, doi = {10.3390/s24103052}, url = {https://www.mdpi.com/1424-8220/24/10/3052}, urldate = {2024-05-13}, code = {https://github.com/luckyos-code/Privacy-Preserving-Smartwatch-Health-Data-Generation-Using-DP-GANs}, copyright = {http://creativecommons.org/licenses/by/3.0/}, langid = {english}, pdf = {https://www.mdpi.com/1424-8220/24/10/3052/pdf}, keywords = {differential privacy,generative adversarial network,physiological sensor data,privacy-preserving machine learning,smart health,smartwatch,stress recognition,synthetic data,time series}, bibtexshow = {true}, selected = {false} }

2023

Privacy-Preserving Stress Detection Using Smartwatch Health Data

Lucas Lange, Borislav Degenkolb, and Erhard Rahm

4. Interdisciplinary Privacy & Security at Large Workshop, INFORMATIK 2023 (Sep. 2023)

Abs Bib Pub PDF Code

We present the first privacy-preserving approach for stress detection from wrist-worn wearables based on the Time-Series Classification Transformer (TSCT) architecture and incorporating Differential Privacy (DP) to ensure provable privacy guarantees. The non-private baseline results prove the TSCT to be an effective model for the given task. Our DP experiments then show that the private models suffer from reduced utility but can still be used for reliable stress detection depending on the application. Our proposed approach has potential applications in smart health, where it can be used to monitor smartwatch users’ stress levels without compromising their privacy and provide timely interventions or suggestions to prevent adverse health outcomes. Another primary contribution is our evaluation, which studies and shows negative effects of DP regarding model training. The results of this work provide perspectives for future research and applications whenever the fields of stress detection and data privacy intervene.
@inproceedings{langePrivacyPreservingStressDetection2023, title = {Privacy-{{Preserving Stress Detection Using Smartwatch Health Data}}}, booktitle = {4. {{Interdisciplinary Privacy}} \& {{Security}} at {{Large Workshop}}, {{INFORMATIK}} 2023}, author = {Lange, Lucas and Degenkolb, Borislav and Rahm, Erhard}, year = {2023}, month = sep, publisher = {{Gesellschaft f{\"u}r Informatik e.V.}}, doi = {10.18420/inf2023_66}, url = {https://doi.org/10.18420/inf2023_66}, urldate = {2023-12-14}, code = {https://github.com/luckyos-code/Privacy-Preserving-Stress-Transformer}, isbn = {978-3-88579-731-9}, langid = {english}, pdf = {https://dl.gi.de/server/api/core/bitstreams/23387493-d22e-42e2-98f1-45d297d94628/content}, bibtexshow = {true}, selected = {false} }
Privacy-Preserving Sentiment Analysis on Twitter

Felix Vogel, and Lucas Lange

SKILL 2023 (Sep. 2023)

Abs Bib PDF Code

Sentiment analysis is a crucial tool to evaluate customer opinion on products and services. However, analyzing social media data raises concerns about privacy violations since users may share sensitive information in their posts. In this work, we propose a privacy-preserving approach for sentiment analysis on Twitter data using Differential Privacy (DP). We first implement a non-private baseline model and assess the impact of various settings and preprocessing methods. We then extend this approach with DP under multiple privacy parameters ε = 0.1, 1, 10 and finally evaluate the usability of the resulting private models. Our results show that DP models can maintain high accuracy for the studied task. We contribute to the development of privacy-preserving machine learning for customer opinion analysis and provide insights into trade-offs between privacy and utility. The proposed approach helps protect sensitive information while still allowing for valuable insights to be gained from social media data.
@inproceedings{twitterplaceholder, title = {Privacy-Preserving Sentiment Analysis on Twitter}, booktitle = {{{SKILL}} 2023}, author = {Vogel, Felix and Lange, Lucas}, year = {2023}, month = sep, publisher = {{Gesellschaft f{\"u}r Informatik e.V.}}, code = {https://github.com/felix2246/dp-sent-analysis-twitter}, pdf = {https://dbs.uni-leipzig.de/file/SKILL2023_private_twitter_sentiment-6.pdf}, bibtexshow = {true}, selected = {false} }
Privacy in Practice: Private COVID-19 Detection in X-Ray Images

Lucas Lange, Maja Schneider, Peter Christen, and Erhard Rahm

20th International Conference on Security and Cryptography (SECRYPT 2023) (Jul. 2023)

Abs Bib Pub PDF Code

Machine learning (ML) can help fight pandemics like COVID-19 by enabling rapid screening of large volumes of images. To perform data analysis while maintaining patient privacy, we create ML models that satisfy Differential Privacy (DP). Previous works exploring private COVID-19 models are in part based on small datasets, provide weaker or unclear privacy guarantees, and do not investigate practical privacy. We suggest improvements to address these open gaps. We account for inherent class imbalances and evaluate the utility-privacy trade-off more extensively and over stricter privacy budgets. Our evaluation is supported by empirically estimating practical privacy through black-box Membership Inference Attacks (MIAs). The introduced DP should help limit leakage threats posed by MIAs, and our practical analysis is the first to test this hypothesis on the COVID-19 classification task. Our results indicate that needed privacy levels might differ based on the task-dependent practical threat from MIAs. The results further suggest that with increasing DP guarantees, empirical privacy leakage only improves marginally, and DP therefore appears to have a limited impact on practical MIA defense. Our findings identify possibilities for better utility-privacy trade-offs, and we believe that empirical attack-specific privacy estimation can play a vital role in tuning for practical privacy.
@inproceedings{langePrivacyPracticePrivate2023a, title = {Privacy in {{Practice}}: {{Private COVID-19 Detection}} in {{X-Ray Images}}}, booktitle = {20th {{International Conference}} on {{Security}} and {{Cryptography}} ({{SECRYPT}} 2023)}, author = {Lange, Lucas and Schneider, Maja and Christen, Peter and Rahm, Erhard}, year = {2023}, month = jul, pages = {624--633}, publisher = {{SciTePress}}, doi = {10.5220/0012048100003555}, url = {https://doi.org/10.5220/0012048100003555}, urldate = {2023-07-21}, code = {https://github.com/luckyos-code/mia-covid}, isbn = {978-989-758-666-8}, pdf = {https://dbs.uni-leipzig.de/files/research/publications/2023-7/pdf/proc_paper.pdf}, keywords = {COVID-19 Detection,Differential Privacy,Differentially-Private Stochastic Gradient Descent,Membership Inference Attack,Practical Privacy,Privacy-Preserving Machine Learning}, bibtexshow = {true}, selected = {false} }

2020

SentArg: A Hybrid Doc2Vec/DPH Model with Sentiment Analysis Refinement

Christian Staudte, and Lucas Lange

CLEF 2020 Working Notes (Sep. 2020)

Abs Bib Pub PDF Code

In this work we explore the yet untested inclusion of sentiment analysis in the argument ranking process. By utilizing a word embedding model we create document embeddings for all queries and arguments. These are compared with each other to calculate top-N argument context scores for each query. We also calculate top-N DPH scores with the Terrier Framework. This way, each query receives two lists of top-N arguments. Afterwards we form an intersection of both argument lists and sort the result by the DPH scores. To further increase the ranking quality, we sort the final arguments of each query by sentiment values. Our findings ultimately imply that rewarding neutral sentiments can decrease the quality of the retrieval outcome.
@inproceedings{staudteSentArgHybridDoc2Vec2020, title = {{{SentArg}}: {{A Hybrid Doc2Vec}}/{{DPH Model}} with {{Sentiment Analysis Refinement}}}, booktitle = {{{CLEF}} 2020 {{Working Notes}}}, author = {Staudte, Christian and Lange, Lucas}, editor = {Cappellato, Linda and Eickhoff, Carsten and Ferro, Nicola and N{\'e}v{\'e}ol, Aur{\'e}lie}, year = {2020}, month = sep, series = {{{CEUR Workshop Proceedings}}}, volume = {2696}, publisher = {{CEUR}}, address = {{Thessaloniki, Greece}}, issn = {1613-0073}, url = {http://ceur-ws.org/Vol-2696/#paper_191}, urldate = {2022-10-20}, code = {https://github.com/luckyos-code/ArgU}, langid = {english}, pdf = {https://ceur-ws.org/Vol-2696/paper\_191.pdf}, bibtexshow = {true}, selected = {false} }