Data Science meets Data Protection: Keeping your data secure while learning from it.

We'll look at some easy and hopefully smart ways to keep your data secure and well-protected while working with it: We will investigate techniques such as pseudonymization and anonymization and show you how you can apply them to your data and still get some useful insights from it.

Tags: Artificial Intelligence, Business & Start-Ups, Big Data, Data Science

Scheduled on thursday 14:50 in room lecture


Katharine Jarmul

Katharine Jarmul is a data scientist and engineer based in Berlin, Germany. She runs a consulting company Kjamistan where she works with large and small companies to investigate, build and evaluate solutions to data problems. She enjoys teaching, and has made several data-focused online courses and co-authored a book for O'Reilly Media. As a co-founder of PyLadies, she is passionate about diversity in technology and the (Py)Data community. When she's not Pythoning or data wrangling, she's likely cooking, taking photos or plotting how to change the world (on|off)line.

Andreas Dewes (kiprotect)

Data Scientist & Founder @ 7scientists. PhD in Experimental Quantum Physics.


We will discuss anonymization and pseudonymization techniques that you can apply to your data to keep it secure and comply with the law(s) while still being able to gain useful insights from it.

  • Why protect data?
  • Pseudonymization vs. anonymization: What's the difference?
  • Pseudonymization: Techniques & real-world examples
  • Problems and risks when pseudonymizing data
  • Anonymization: Approaches & real-world examples
  • Problems and risks when anyonymizing data
  • Takeaways and summary

We will show concrete Python implementations of various techniques and use example data sets to show how applying pseudonymization and anonymization will affect our ability to do machine learning / data science.