The Problem
You and your friends live in Prague, planning to meet friends at one place. Where should you meet? The naive answer might be “somewhere in the middle,” but that doesn’t account for Prague’s complex public transport network. Some stops are well-connected, others require multiple transfers.
The Solution
I built pub-finder, a web application that finds the optimal meeting point for groups of friends in Prague. Given starting stops (e.g., Krymská, Anděl, Muzeum), the app finds the target stop that minimizes travel time for everyone. We assume that near every stop there is a pub available in Prague (which might be bolt assumption), but you can use it in general to find the closest stop to everyone.
Mathematical Formulation
Given a set of starting stops , we want to find an optimal target stop (where is the set of all 1463 stops in Prague) that minimizes travel distance/time for all participants.
We define two distance metrics:
- Geographic Distance: (distance between stops on a globe in kilometers)
- Temporal Distance: (minutes to travel from to at datetime )
The app offers two optimization objectives:
1. Minimize Worst Case (Min-Max)
Find that minimizes the maximum travel time:
This ensures no one has an unreasonably long commute.
2. Minimize Total Time (Sum)
Find that minimizes the sum of all travel times:
This minimizes the collective travel burden.
Two-Stage Optimization Algorithm
The challenge: we can’t query real-time travel times for all starting points in real-time (that would be thousands of API calls). Instead, I use a two-stage approach:
Stage 1: Candidate Selection
- Pre-compute geographic distances for all stop pairs ( space)
- Pre-scrape temporal distances for ~2.1M stop combinations from IDOS/DPP APIs
- Select top candidates based on geographic distance:
- Select top candidates based on pre-scraped temporal distance:
- Union: (typically ~35 candidates)
Stage 2: Real-Time Refinement
- For each candidate , query actual travel times for the user-specified datetime :
- for all
- Compute objective values:
- Sort candidates by selected objective and return top results
This reduces API calls from to (typically ), making the app responsive while maintaining accuracy.
Technical Implementation
Data Collection
The hardest part was collecting the data. Prague has 1463 public transport stops, which means stop pairs (accounting for bidirectional travel). I scraped travel times from two sources:
- IDOS API (idos.cz) - More reliable, lower error rate
- DPP API (dpp.cz) - Slightly higher error rate, used as fallback
The scraping process:
- Used multiprocessing with 50 parallel workers
- Implemented retry logic with exponential backoff
- Scraped all combinations for a reference datetime (Friday 20:00)
- Stored results in Parquet format for efficient querying
Architecture
- Frontend: Gradio (Python web framework) - simple, fast to prototype
- Data Processing: Polars (fast DataFrame library, Rust-based)
- Deployment: Docker container, runs on a VPS
Performance Optimizations
- Pre-computed distance matrix: lookups for geographic distances
- Candidate filtering: Reduces search space from 1463 to ~35 stops
- Caching: API responses cached for 24 hours
- Parallel processing: Multiple API calls executed concurrently
- Early termination: Stop querying once we have enough good candidates
Try It Out
Check out the live demo: pub-finder.hermandaniel.com
Source code: github.com/detrin/pub-finder