Suggestions from Python and Solr

Trying to guess what a user wants when she's typing something into the search box of our price comparison website is a surprisingly complex endevour. Our solution is based on the Solr SuggestComponent, heavily fortified with Python logic.

Tags: Algorithms

Scheduled on friday 11:20 in room cubus

Speaker

Patrick Schemitz

Patrick is a Senior Scientist at solute GmbH. An avid Pythonista since 2003, his main responsibility is the billiger.de search functionality, which he (co-) wrote using first Lucene, later Solr and now SolrCloud. Besides that, he wrote much of the SVM-based offer categorization at billiger.de and has a keen interest in machine learning. Patrick holds a Ph.D. in particle physics from Karlsruhe university.

Jonathan Oberländer (L3viathan2142)

Jonathan started programming at the tender age of 12, after accidentally buying a book about C++. He quickly moved on to other languages (VBScript, AutoIt, PHP, Javascript, Perl, ...), but it wasn't until his Bachelor studies in Computational Linguistics (Saarland University) that he started learning Python, after having to choose between it and a Java course. Since then, he has mostly stayed true to Python, except for the occasional affair with esoteric programming languages. After finishing his Master's in Cognitive Science (Trento) and Computer Science (Prague), he started working as a full-time Python developer at the German price comparison website billiger.de

Description

When a user types a query into the search box of our price comparison website, we try to figure out what they search, and provide suggestions as they type along. What product, what brand, from which categories? Solr provides a SuggestComponent that is a good start, but in a lot of situations we need fallback strategies: what should we show to a user searching for just a brand name? Or for a singular offer we can't actually show them? What alternatives can we dig up? And behind all this backfill logic lurks that dreaded question: what amount of irrelevant garbage is worse than the horror vacui of an empty result set?