Bluestreak — Privacy-Aware User Segmentation for Online Advertisement Using Logistic Regression

Abstract
Changelog for v1.0.1 (2025-07-15):
Corrected cover title:
Linear Regression'' ->
Logistic Regression’’ and added hyphenation to ``Privacy-Aware’'
Added PDF metadata (pdftitle, pdfauthor, pdfsubject)
Minor typo and wording fixes
-–
Abstract
The growing awareness of privacy in the digital world has not only
made the blocking of third-party cookies more common but also introduced major
regulatory changes through the new European General Data Protection Regulation (GDPR).
This regulation has inherently changed the Internet in general and the online
advertising industry in particular: under these conditions, the traditional approach
of tracking via user profiles is becoming increasingly difficult. In this thesis,
an alternative approach for predicting age and gender segments of a user is proposed.
With the presented Bluestreak method, the sensitive data remains on the user’s device
and only the anonymous segment predictions are sent back to the server. It differs
from common approaches in that the collection of the required data and the prediction
of the desired segments is shifted to the user’s browser. This approach is independent
of tracking cookies and thus preserves the user’s privacy. We conducted an evaluation
on a real-world data set and show that it is possible to improve the prediction
accuracy for age and gender segments compared to a User-Agent-based approach while
only posing a low overhead on user’s devices.Type
Publication
Zenodo
A TypeScript/JavaScript library and server-side component that shifts age- and gender-segment prediction into the user’s browser, ensuring all sensitive data remains on-device and on ly anonymous segment labels are sent back to the ad server. By training an L1-regularized logistic regression model on a real-world RTB dataset, Bluestreak improves age-segment accura cy by 4 pp and gender accuracy by 2 pp over a User-Agent-based baseline, without relying on cookies or fingerprinting and adding minimal load-time and resource overhead. Fully GDPR-co mpliant and compatible with >95 % of modern browsers, it delivers privacy by design while preserving ad-targeting precision.
Age Prediction
Client-Side Machine Learning
Edge Computing
Gender Prediction
In-Browser ML
Logistic Regression
Privacy
User Segmentation

Authors
Heye Vöcking
Senior Data Engineer
Data & Knowledge Engineer with 10+ years of professional experience transforming petabyte-scale data into knowledge. Currently stress-testing large-language-model alignment, developing jailbreaks, and building real-time knowledge-graph systems. Interests include ML security, physics, Austrian economics, and Bitcoin.