Distributed Hyperparameter search with sklearn and kubernetes

In this talk, I will show how you can harness the scheduling of kubernetes for distributing hyperparameter search with sklearn onto a cluster of nodes. This can be achieved quite easily and with just a few changes to the original code, so the Data Scientist won't be bothered by complex kubernetes internals.

Tags: Algorithms, Big Data, Data Science, DevOps, Infrastructure, Machine Learning

Scheduled on wednesday 17:10 in room media

Speaker

Jakob Karalus (@krallistic)

Data Scientist/Consultant for codecentric.

Description

While sklearn provides a good interface to do hyperparameter search on large & complex model (pipelines), doing these can take up a lot of time. The traditional way usually includes one beefy machine and a lot of waiting. In other cases, people tend to “manually” schedule parameter ranges between nodes, but that can also be problematic since these won't talk to each other. Kubernetes itself is currently the most prominent scheduler and shines at distributing task, but is a pretty complex system in itself.

In this talk, I will show how you can harness the scheduling of kubernetes for distributing hyperparameter search with sklearn onto a cluster of nodes. This can be achieved quite easily and with just a few changes to the original code, so the Data Scientist won't be bothered by complex kubernetes internals.