Optimal Pub Finder for Meetups • Daniel Herman

The Problem

You and your friends live in Prague, planning to meet friends at one place. Where should you meet? The naive answer might be “somewhere in the middle,” but that doesn’t account for Prague’s complex public transport network. Some stops are well-connected, others require multiple transfers.

The Solution

I built pub-finder, a web application that finds the optimal meeting point for groups of friends in Prague. Given $k$ starting stops (e.g., Krymská, Anděl, Muzeum), the app finds the target stop that minimizes travel time for everyone. We assume that near every stop there is a pub available in Prague (which might be bolt assumption), but you can use it in general to find the closest stop to everyone.

Mathematical Formulation

Given a set of $k$ starting stops $S = \{s_1, s_2, \ldots, s_k\}$ , we want to find an optimal target stop $t^* \in T$ (where $T$ is the set of all 1463 stops in Prague) that minimizes travel distance/time for all participants.

We define two distance metrics:

Geographic Distance: $d_{\text{geo}}(s_i, t) = \|\text{GPS}(s_i) - \text{GPS}(t)\|$ (distance between stops on a globe in kilometers)
Temporal Distance: $d_{\text{time}}(s_i, t, dt) = \text{travel\_time}(s_i \to t, dt)$ (minutes to travel from $s_i$ to $t$ at datetime $dt$ )

The app offers two optimization objectives:

1. Minimize Worst Case (Min-Max)

Find $t^*$ that minimizes the maximum travel time:

$t^* = \underset{t \in T}{\text{argmin}} \max_{i=1}^{k} d(s_i, t)$

This ensures no one has an unreasonably long commute.

2. Minimize Total Time (Sum)

Find $t^*$ that minimizes the sum of all travel times:

$t^* = \underset{t \in T}{\text{argmin}} \sum_{i=1}^{k} d(s_i, t)$

This minimizes the collective travel burden.

Two-Stage Optimization Algorithm

The challenge: we can’t query real-time travel times for all $1463 \times k$ starting points in real-time (that would be thousands of API calls). Instead, I use a two-stage approach:

Stage 1: Candidate Selection

Pre-compute geographic distances for all stop pairs ( $O(n^2)$ space)
Pre-scrape temporal distances for ~2.1M stop combinations from IDOS/DPP APIs
Select top $N_{\text{geo}}$ candidates based on geographic distance: $C_{\text{geo}} = \text{top}_{N_{\text{geo}}}(\underset{t \in T}{\text{argmin}} f(d_{\text{geo}}(s_i, t)))$
Select top $N_{\text{time}}$ candidates based on pre-scraped temporal distance: $C_{\text{time}} = \text{top}_{N_{\text{time}}}(\underset{t \in T}{\text{argmin}} f(d_{\text{time}}(s_i, t, dt_{\text{ref}})))$
Union: $C = C_{\text{geo}} \cup C_{\text{time}}$ (typically ~35 candidates)

Stage 2: Real-Time Refinement

For each candidate $c \in C$ $c \in C$ , query actual travel times for the user-specified datetime $dt$ $d t$ :
- $d_{\text{actual}}(s_i, c, dt) = \text{get\_total\_minutes}(s_i, c, dt)$ for all $i \in \{1, \ldots, k\}$
Compute objective values:
- $f_{\text{worst}}(c) = \max(d_{\text{actual}}(s_1, c, dt), \ldots, d_{\text{actual}}(s_k, c, dt))$
- $f_{\text{total}}(c) = \sum_{i=1}^{k} d_{\text{actual}}(s_i, c, dt)$
Sort candidates by selected objective and return top $N_{\text{display}}$ results

This reduces API calls from $k \times 1463$ to $k \times |C|$ (typically $k \times 35$ ), making the app responsive while maintaining accuracy.

Technical Implementation

Data Collection

The hardest part was collecting the data. Prague has 1463 public transport stops, which means $1463^2 \approx 2.1\text{M}$ stop pairs (accounting for bidirectional travel). I scraped travel times from two sources:

IDOS API (idos.cz) - More reliable, lower error rate
DPP API (dpp.cz) - Slightly higher error rate, used as fallback

The scraping process:

Used multiprocessing with 50 parallel workers
Implemented retry logic with exponential backoff
Scraped all combinations for a reference datetime (Friday 20:00)
Stored results in Parquet format for efficient querying

Architecture

Frontend: Gradio (Python web framework) - simple, fast to prototype
Data Processing: Polars (fast DataFrame library, Rust-based)
Deployment: Docker container, runs on a VPS

Performance Optimizations

Pre-computed distance matrix: $O(1)$ lookups for geographic distances
Candidate filtering: Reduces search space from 1463 to ~35 stops
Caching: API responses cached for 24 hours
Parallel processing: Multiple API calls executed concurrently
Early termination: Stop querying once we have enough good candidates

Try It Out

Check out the live demo: pub-finder.hermandaniel.com

Source code: github.com/detrin/pub-finder

Share your feedback →