Life as a Doctoral Candidate: the many facets of doing research

#04 David Kovacs: Data, Data, Data – How to handle sensitive eye information?

Ulm University

"The real risk emerges from how this data is processed and shared"

Research is all about data. Eye tracking recordings provide us with enormous amounts of quantitative data. For example, 5 minutes of eye movement recording can contain tens of thousands of data points, capturing fixations, saccades, blinks, and pupil size changes. In Eyes4ICU, we study how to use this data for User Understanding, for Gaze Communication, and for Application in In-the-Wild settings, that is, outside the lab. In his PhD project at IT University of Copenhagen, David Kovacs is investigating a very specific and important aspect of gaze data which can be regarded as relevant to all use cases: How to ensure privacy of gaze data.

What is special about gaze data in that data security is of particular relevance for it?

Eye tracking data is inherently biometric, meaning that the patterns of eye movements can be as distinctive as a fingerprint. Even when datasets have been anonymized, i.e. stripped of names or other personally identifiable details, research has shown that just a few seconds of gaze data may be enough to re-identify an individual. This makes gaze data particularly sensitive and different from many other types of behavioural data. 

Beyond the risk of losing anonymity, gaze data can potentially reveal a surprising amount of sensitive personal information. Studies have demonstrated that eye movement patterns can be used to infer a person’s age, gender, aspects of their personality, cognitive abilities, emotional states, levels of fatigue or stress, and implicit preferences or biases. In some cases, eye tracking data has even been linked to health conditions, making it a potential source of medical or psychological insights. Additionally, research suggests that gaze behaviour can indicate interests, fears, and even sexual orientation, information that a person may not wish to disclose or may not even be consciously aware of.

So even when no video of the eye is stored, and personal information is unknown people can be identified? How can information be drawn from this data? 

Yes, even without video recordings of the eye or any direct personal identifiers, individuals may still be recognized from their eye tracking data. Eye movement patterns, such as saccades, fixations, and scanpaths, are remarkably stable over time and differ from person to person in ways that algorithms can detect. Research has demonstrated that machine learning models can reliably match a person’s gaze patterns across sessions, even when no other identifying details (like a name or address) are present.

Furthermore, once these unique eye movement patterns are established, additional inferences become possible. As previously mentioned, one could estimate an individual’s age range, gender, or certain health conditions, potentially narrowing down their identity further. By combining this inferred information with external datasets, it becomes increasingly more likely to de-anonymize the data and discover more about the individual’s identity, background or lifestyle. 

Are there already attempts to overcome this issue?

Yes, for instance, the previously mentioned traditional anonymization methods, such as removing personally identifiable information or pseudonymizing user IDs, have been tried, but often fail in the context of eye tracking. This has prompted researchers to explore stronger privacy-preserving techniques.

One such commonly researched method is differential privacy, which adds carefully calculated noise to gaze data to reduce the likelihood that anyone can link specific eye-movement patterns to a single individual. Unlike traditional approaches, differential privacy works at a mathematical level to protect anonymity. However, it can be limited in situations requiring high-precision gaze measurements, as the added noise reduces data quality. 

Although differential privacy can be useful in some cases, it is not a universal solution and currently, no single method can guarantee complete privacy and data security for every eye tracking application. Ongoing research aims to refine existing methods, or develop entirely new approaches, but it remains unclear which solutions will ultimately provide the best balance between data utility and user privacy.

What is your approach to tackle this issue? 

My approach builds on the following assumption: to determine the most effective privacy measures for eye tracking, we must first establish clarity about the specific threats that exist. Although previous research has demonstrated many privacy risks, these findings have often been limited to very narrow, controlled circumstances. Similarly, while numerous privacy-protection methods have been proposed, it is frequently unclear which exact threats they mitigate, how well they do so under varying conditions, or whether they can scale to more complex real-world scenarios.

To address this gap, I plan to systematically investigate eye tracking tasks under a wide range of conditions, encompassing, for example, cognitive load monitoring setups and concussion detection studies. By analysing how privacy risks arise and evolve in these varied contexts, I hope to develop a robust, repeatable framework for identifying, quantifying, and comparing the threats that eye tracking data may pose. This methodology would then be applicable across diverse experimental designs and setups, offering a practical toolkit that researchers and developers can use to quantify and address threats across diverse scenarios. 

How long do you think it will take for such measures to be implemented in consumer devices? 

Since many of the current privacy-preserving methods introduce noise, lag, or other distortions, and many applications rely on highly precise eye tracking data, it is not likely that these privacy-enhancing techniques will be built directly into the hardware or firmware of consumer devices in the near future. It is also important to mention that high-precision data, on its own, does not inherently pose a privacy threat. When data is only collected and stored, without being actively processed or shared, and is secured by strong encryption, access controls, and other safeguards on a local device, the potential for privacy violations is minimal.

However, the real risk emerges from how this data is processed and shared. Privacy threats quickly arise if data is sent to third parties without adequate protections, analysed and released with insufficient anonymization procedures, or aggregated with additional datasets that enable re-identification. For these reasons, privacy protection is more likely, and more beneficial, when data collectors (such as large corporations or research institutions) implement rigorous privacy measures as a standard practice, rather than leaving it up to individual devices. In fact, much like GDPR sets legal requirements for data handling, the integration of robust privacy-enhancing methods should ideally be mandated by law, rather than merely recommended. This is exactly why my ultimate aim is to provide a flexible, easy-to-use toolkit that researchers and organizations can adapt to safeguard eye tracking data.

Do you have any advice for the usage of consumer devices with in-built eye-tracking, like the iPhone (with the latest accessibility feature being eye tracking)? 

The first step is always to educate yourself on potential privacy risks and the rights you have under laws like GDPR. Built-in eye tracking features, such as those in modern smartphones and tablets, can offer convenience and accessibility, but they also introduce new concerns about how gaze data is collected, stored, and shared. Users should be mindful of app permissions and carefully review which applications have access to eye tracking data.

Ultimately, while consumer eye tracking technology is becoming more widespread, the privacy risks it poses are not yet fully addressed by industry standards. Until stricter regulations or better privacy preserving techniques are implemented, users should take an active role in managing their own data security, whether by adjusting settings, or limiting unnecessary data sharing.

David Kovacs investigates how to handle sensitive eye information