---
title: "Optimal Pub Finder for Meetups"
description: "Wondering where you and your friends should up without travelling too much? This app is for you then."
published: 2025-12-08
tags: ["math"]
---

## The Problem

You and your friends live in Prague, planning to meet friends at one place. Where should you meet? The naive answer might be "somewhere in the middle," but that doesn't account for Prague's complex public transport network. Some stops are well-connected, others require multiple transfers.

## The Solution

I built [pub-finder](https://pub-finder.hermandaniel.com), a web application that finds the optimal meeting point for groups of friends in Prague. Given $k$ starting stops (e.g., Krymská, Anděl, Muzeum), the app finds the target stop that minimizes travel time for everyone. We assume that near every stop there is a pub available in Prague (which might be bolt assumption), but you can use it in general to find the closest stop to everyone.  

### Mathematical Formulation

Given a set of $k$ starting stops $S = \{s_1, s_2, \ldots, s_k\}$, we want to find an optimal target stop $t^* \in T$ (where $T$ is the set of all 1463 stops in Prague) that minimizes travel distance/time for all participants.

We define two distance metrics:

1. **Geographic Distance**: $d_{\text{geo}}(s_i, t) = \|\text{GPS}(s_i) - \text{GPS}(t)\|$ (distance between stops on a globe in kilometers)
2. **Temporal Distance**: $d_{\text{time}}(s_i, t, dt) = \text{travel\_time}(s_i \to t, dt)$ (minutes to travel from $s_i$ to $t$ at datetime $dt$)

The app offers two optimization objectives:

#### 1. Minimize Worst Case (Min-Max)
Find $t^*$ that minimizes the maximum travel time:

$$t^* = \underset{t \in T}{\text{argmin}} \max_{i=1}^{k} d(s_i, t)$$

This ensures no one has an unreasonably long commute.

#### 2. Minimize Total Time (Sum)
Find $t^*$ that minimizes the sum of all travel times:

$$t^* = \underset{t \in T}{\text{argmin}} \sum_{i=1}^{k} d(s_i, t)$$

This minimizes the collective travel burden.

### Two-Stage Optimization Algorithm

The challenge: we can't query real-time travel times for all $1463 \times k$ starting points in real-time (that would be thousands of API calls). Instead, I use a two-stage approach:

**Stage 1: Candidate Selection**
- Pre-compute geographic distances for all stop pairs ($O(n^2)$ space)
- Pre-scrape temporal distances for ~2.1M stop combinations from IDOS/DPP APIs
- Select top $N_{\text{geo}}$ candidates based on geographic distance: $C_{\text{geo}} = \text{top}_{N_{\text{geo}}}(\underset{t \in T}{\text{argmin}} f(d_{\text{geo}}(s_i, t)))$
- Select top $N_{\text{time}}$ candidates based on pre-scraped temporal distance: $C_{\text{time}} = \text{top}_{N_{\text{time}}}(\underset{t \in T}{\text{argmin}} f(d_{\text{time}}(s_i, t, dt_{\text{ref}})))$
- Union: $C = C_{\text{geo}} \cup C_{\text{time}}$ (typically ~35 candidates)

**Stage 2: Real-Time Refinement**
- For each candidate $c \in C$, query actual travel times for the user-specified datetime $dt$:
  - $d_{\text{actual}}(s_i, c, dt) = \text{get\_total\_minutes}(s_i, c, dt)$ for all $i \in \{1, \ldots, k\}$
- Compute objective values:
  - $f_{\text{worst}}(c) = \max(d_{\text{actual}}(s_1, c, dt), \ldots, d_{\text{actual}}(s_k, c, dt))$
  - $f_{\text{total}}(c) = \sum_{i=1}^{k} d_{\text{actual}}(s_i, c, dt)$
- Sort candidates by selected objective and return top $N_{\text{display}}$ results

This reduces API calls from $k \times 1463$ to $k \times |C|$ (typically $k \times 35$), making the app responsive while maintaining accuracy.

## Technical Implementation

### Data Collection

The hardest part was collecting the data. Prague has 1463 public transport stops, which means $1463^2 \approx 2.1\text{M}$ stop pairs (accounting for bidirectional travel). I scraped travel times from two sources:

1. **IDOS API** (idos.cz) - More reliable, lower error rate
2. **DPP API** (dpp.cz) - Slightly higher error rate, used as fallback

The scraping process:
- Used multiprocessing with 50 parallel workers
- Implemented retry logic with exponential backoff
- Scraped all combinations for a reference datetime (Friday 20:00)
- Stored results in Parquet format for efficient querying

### Architecture

- **Frontend**: Gradio (Python web framework) - simple, fast to prototype
- **Data Processing**: Polars (fast DataFrame library, Rust-based)
- **Deployment**: Docker container, runs on a VPS

### Performance Optimizations

1. **Pre-computed distance matrix**: $O(1)$ lookups for geographic distances
2. **Candidate filtering**: Reduces search space from 1463 to ~35 stops
3. **Caching**: API responses cached for 24 hours
4. **Parallel processing**: Multiple API calls executed concurrently
5. **Early termination**: Stop querying once we have enough good candidates

## Try It Out

Check out the live demo: [pub-finder.hermandaniel.com](https://pub-finder.hermandaniel.com)

Source code: [github.com/detrin/pub-finder](https://github.com/detrin/pub-finder)

[Share your feedback →](https://docs.google.com/forms/d/e/1FAIpQLSeXq6DXWkjcsgs4XRPN0VnccThMwjDQP2Si25MMB76yW14tZA/viewform?usp=dialog)