Bluestreak — Privacy-Aware User Segmentation for Online Advertisement Using Logistic Regression

Mar 1, 2021·
Heye Vöcking
Heye Vöcking
· 1 min read
A illustration of a Bluestreak cleaner wrasse.
Abstract
Changelog for v1.0.1 (2025-07-15): Corrected cover title: Linear Regression'' -> Logistic Regression’’ and added hyphenation to ``Privacy-Aware’' Added PDF metadata (pdftitle, pdfauthor, pdfsubject) Minor typo and wording fixes -– Abstract The growing awareness of privacy in the digital world has not only made the blocking of third-party cookies more common but also introduced major regulatory changes through the new European General Data Protection Regulation (GDPR). This regulation has inherently changed the Internet in general and the online advertising industry in particular: under these conditions, the traditional approach of tracking via user profiles is becoming increasingly difficult. In this thesis, an alternative approach for predicting age and gender segments of a user is proposed. With the presented Bluestreak method, the sensitive data remains on the user’s device and only the anonymous segment predictions are sent back to the server. It differs from common approaches in that the collection of the required data and the prediction of the desired segments is shifted to the user’s browser. This approach is independent of tracking cookies and thus preserves the user’s privacy. We conducted an evaluation on a real-world data set and show that it is possible to improve the prediction accuracy for age and gender segments compared to a User-Agent-based approach while only posing a low overhead on user’s devices.
Type
Publication
Zenodo

A TypeScript/JavaScript library and server-side component that shifts age- and gender-segment prediction into the user’s browser, ensuring all sensitive data remains on-device and on ly anonymous segment labels are sent back to the ad server. By training an L1-regularized logistic regression model on a real-world RTB dataset, Bluestreak improves age-segment accura cy by 4 pp and gender accuracy by 2 pp over a User-Agent-based baseline, without relying on cookies or fingerprinting and adding minimal load-time and resource overhead. Fully GDPR-co mpliant and compatible with >95 % of modern browsers, it delivers privacy by design while preserving ad-targeting precision.

Heye Vöcking
Authors
Heye Vöcking
Senior Data Engineer
Data & Knowledge Engineer with 10+ years of professional experience transforming petabyte-scale data into knowledge. Currently stress-testing large-language-model alignment, developing jailbreaks, and building real-time knowledge-graph systems. Interests include ML security, physics, Austrian economics, and Bitcoin.