Solving Data Science Problems using a Jupyter Notebook and SAP HANA's in-database Machine Learning Libraries

During this talk we will present how a Data Scientist can work on datasets stored in SAP HANA's Database leveraging in-database machine learning libraries. Data will reside in the database and calcuations with be pushed down to the DB minimizing data transfer to the client.

Tags: Big Data, Data Science, Jupyter, Machine Learning, Visualisation

Scheduled on thursday 16:35 in room cubus

Speaker

Dr Frank Gottfried

Frank Gottfried is a Development Architect at SAP SE. He holds a Ph.D. in Physics from the University of Heidelberg and a master’s degree in IP Management and Law from the University of Strasbourg. He joined SAP in 1996 as a software developer and since then has gained experience in various technical and management functions and consulting. For the last few years he’s been working on machine learning topics using SAP HANA’s ML libraries and Deep Learning frameworks (TensorFlow).

Description

Companies store their data in databases with highly restricted access regulations. The latest regulatorily changes enforces the need to work on the datasets in this controlled environment without created additional external copies. However Data Scientists prefer to work with tools they are most familiar like Python, R and Jupyter Notebooks using to a large amount of open-source packages (numpy, matplotlib, pandas, ..). SAP HANA provides highly optimized in-database machine learning libraries. In this talk we will present how a Data Scientist can work in an environment he/she is most familiar with and access the data stored in SAP HANA using SAP HANA machine learning libraries with a scikit-learn type interface. Data will remain in the database and will be exposed as dataframes (similar to Pandas dataframes). We will explain the software architecture and present a complete end-to-end use case by using a Jupyter Notebook.