Beyond Jupyter Notebooks - Building your own Data Science platform with Python & Docker

This talk will interactively show a potential reference architecture for a self-build data science platform using Python & Docker. The presented services tackle common data science pain points such as model persistence, concurrent model training, scheduled model retraining, model exposure, etc.

Tags: Artificial Intelligence, Algorithms, Data Science, DevOps, Infrastructure, Jupyter, Machine Learning, Programming, Python

Scheduled on thursday 14:00 in room lounge

Speaker

Joshua Görner

After 5 years of experience in the pharmaceutical industry, Joshua Goerner switched into the automotive industry as a Data Scientist for BMW AG. In his current position he is specialised on working with sensor data extracted from connected vehicles. His major research interests cover the reproducibility of data science projects and the fusion of data science and modern software engineering.

Description

Interactive notebooks like Jupyter have become more and more popular in the recent past and build the core of many data scientist's workplace. Being accessed via web browser they allow scientists to easily structure their work by combining code and documentation.

Yet notebooks often lead to isolated and disposable analysis artefacts. Keeping the computation inside those notebooks does not allow for convenient concurrent model training, model exposure or scheduled model retraining.

Those issues can be addressed by taking advantage of recent developments in the discipline of software engineering. Over the past years containerization became the technology of choice for crafting and deploying applications. Building a data science platform that allows for easy access (via notebooks), flexibility and reproducibility (via containerization) combines the best of both worlds and addresses Data Scientist's hidden needs.