Back

/ 3 min read

Optimal Pub Finder for Meetups

The Problem

You and your friends live in Prague, planning to meet friends at one place. Where should you meet? The naive answer might be “somewhere in the middle,” but that doesn’t account for Prague’s complex public transport network. Some stops are well-connected, others require multiple transfers.

The Solution

I built pub-finder, a web application that finds the optimal meeting point for groups of friends in Prague. Given kk starting stops (e.g., Krymská, Anděl, Muzeum), the app finds the target stop that minimizes travel time for everyone. We assume that near every stop there is a pub available in Prague (which might be bolt assumption), but you can use it in general to find the closest stop to everyone.

Mathematical Formulation

Given a set of kk starting stops S={s1,s2,,sk}S = \{s_1, s_2, \ldots, s_k\}, we want to find an optimal target stop tTt^* \in T (where TT is the set of all 1463 stops in Prague) that minimizes travel distance/time for all participants.

We define two distance metrics:

  1. Geographic Distance: dgeo(si,t)=GPS(si)GPS(t)d_{\text{geo}}(s_i, t) = \|\text{GPS}(s_i) - \text{GPS}(t)\| (distance between stops on a globe in kilometers)
  2. Temporal Distance: dtime(si,t,dt)=travel_time(sit,dt)d_{\text{time}}(s_i, t, dt) = \text{travel\_time}(s_i \to t, dt) (minutes to travel from sis_i to tt at datetime dtdt)

The app offers two optimization objectives:

1. Minimize Worst Case (Min-Max)

Find tt^* that minimizes the maximum travel time:

t=argmintTmaxi=1kd(si,t)t^* = \underset{t \in T}{\text{argmin}} \max_{i=1}^{k} d(s_i, t)

This ensures no one has an unreasonably long commute.

2. Minimize Total Time (Sum)

Find tt^* that minimizes the sum of all travel times:

t=argmintTi=1kd(si,t)t^* = \underset{t \in T}{\text{argmin}} \sum_{i=1}^{k} d(s_i, t)

This minimizes the collective travel burden.

Two-Stage Optimization Algorithm

The challenge: we can’t query real-time travel times for all 1463×k1463 \times k starting points in real-time (that would be thousands of API calls). Instead, I use a two-stage approach:

Stage 1: Candidate Selection

  • Pre-compute geographic distances for all stop pairs (O(n2)O(n^2) space)
  • Pre-scrape temporal distances for ~2.1M stop combinations from IDOS/DPP APIs
  • Select top NgeoN_{\text{geo}} candidates based on geographic distance: Cgeo=topNgeo(argmintTf(dgeo(si,t)))C_{\text{geo}} = \text{top}_{N_{\text{geo}}}(\underset{t \in T}{\text{argmin}} f(d_{\text{geo}}(s_i, t)))
  • Select top NtimeN_{\text{time}} candidates based on pre-scraped temporal distance: Ctime=topNtime(argmintTf(dtime(si,t,dtref)))C_{\text{time}} = \text{top}_{N_{\text{time}}}(\underset{t \in T}{\text{argmin}} f(d_{\text{time}}(s_i, t, dt_{\text{ref}})))
  • Union: C=CgeoCtimeC = C_{\text{geo}} \cup C_{\text{time}} (typically ~35 candidates)

Stage 2: Real-Time Refinement

  • For each candidate cCc \in C, query actual travel times for the user-specified datetime dtdt:
    • dactual(si,c,dt)=get_total_minutes(si,c,dt)d_{\text{actual}}(s_i, c, dt) = \text{get\_total\_minutes}(s_i, c, dt) for all i{1,,k}i \in \{1, \ldots, k\}
  • Compute objective values:
    • fworst(c)=max(dactual(s1,c,dt),,dactual(sk,c,dt))f_{\text{worst}}(c) = \max(d_{\text{actual}}(s_1, c, dt), \ldots, d_{\text{actual}}(s_k, c, dt))
    • ftotal(c)=i=1kdactual(si,c,dt)f_{\text{total}}(c) = \sum_{i=1}^{k} d_{\text{actual}}(s_i, c, dt)
  • Sort candidates by selected objective and return top NdisplayN_{\text{display}} results

This reduces API calls from k×1463k \times 1463 to k×Ck \times |C| (typically k×35k \times 35), making the app responsive while maintaining accuracy.

Technical Implementation

Data Collection

The hardest part was collecting the data. Prague has 1463 public transport stops, which means 146322.1M1463^2 \approx 2.1\text{M} stop pairs (accounting for bidirectional travel). I scraped travel times from two sources:

  1. IDOS API (idos.cz) - More reliable, lower error rate
  2. DPP API (dpp.cz) - Slightly higher error rate, used as fallback

The scraping process:

  • Used multiprocessing with 50 parallel workers
  • Implemented retry logic with exponential backoff
  • Scraped all combinations for a reference datetime (Friday 20:00)
  • Stored results in Parquet format for efficient querying

Architecture

  • Frontend: Gradio (Python web framework) - simple, fast to prototype
  • Data Processing: Polars (fast DataFrame library, Rust-based)
  • Deployment: Docker container, runs on a VPS

Performance Optimizations

  1. Pre-computed distance matrix: O(1)O(1) lookups for geographic distances
  2. Candidate filtering: Reduces search space from 1463 to ~35 stops
  3. Caching: API responses cached for 24 hours
  4. Parallel processing: Multiple API calls executed concurrently
  5. Early termination: Stop querying once we have enough good candidates

Try It Out

Check out the live demo: pub-finder.hermandaniel.com

Source code: github.com/detrin/pub-finder

Share your feedback →