NYCT Subway B Service: Data Exploration

Author

Seamus Joyce-Johnson

Published

January 23, 2026

Background

The B Sixth Avenue Express/Concourse Local/Brighton Local is a New York City Subway service that operates on weekdays (daytime) only. The B typically serves 27 stations from Brighton Beach in the south to 145th St in the north. Trains extend 10 additional stations north to Bedford Park Boulevard at rush hour. It is an express service in Brooklyn and Lower Manhattan and a local service in Upper Manhattan (north of 59th St) and the Bronx. The lines served and key merge points are as follows (south to north):

BMT Brighton Line (4 tracks, B uses inner express tracks and Q uses outer local tracks) starting at Brighton Beach
Prospect Park: between Parkside Av and Prospect Park stations, the Brighton Line narrows to two tracks and B and Q trains share tracks (northbound B and Q trains merge north of Parkside), using the outer tracks at DeKalb Av Station.
DeKalb Interlocking: this infamous merge reshuffles the B, D, N, and Q trains before heading over the Manhattan Bridge. The B and D services use the easternmost two tracks on the bridge to access the 6th Avenue Line. Northbound B trains can thus stay on the outermost track through the interlocking (and merge with D trains), but southbound B trains must cross over and merge with Q (and select N) trains.
Manhattan Bridge and Chrystie Street Connection (2 tracks, shared with D)
IND Sixth Avenue Line (4 tracks, B and D use inner express tracks and F and M use outer local tracks) from Chrystie Street Connection to 47th-50th Sts–Rockefeller Center
North of 47th-50th Sts–Rockefeller Center, the express tracks join the Eighth Avenue Line in a flying junction, requiring northbound B trains to merge with C trains on the local track and southbound B trains to merge with D trains on the express tracks.
The B continues as a local service (shared with C trains) on the Eighth Avenue Line until 135th St
Between 135th and 145th St stations, southbound B trains must merge with southbound C trains. Northbound B trains must merge with northbound D trains (except during the peak).
At 145th St (a three-track station), B trains short turn using the middle track outside of the peak. Departing trains must merge with southbound D trains.
IND Concourse Line: peak hour B trains continue north into the Bronx on the outer local tracks.
Bedford Park Boulevard: B trains terminate and reverse using the center track.

Merge points summary:

Northbound
- Parkside/Prospect Park merge with Q train
- DeKalb interlocking merge with D train
- Columbus Circle merge with C train
- 135th/145th merge with D train (off-peak)
Southbound
- 145 St short-turn departing trains merge with D train (off-peak)
- 145th/135th merge with C train
- Columbus Circle merge with D train
- DeKalb interlocking merge with Q train

Import data

Code

# note: run 0_import_data script first to download data locally

# these three are pretty small, so we can load them into memory
end_to_end_runtimes <- open_dataset("data/end_to_end_runtimes") |>
  collect() |>
  mutate(yr = as.factor(yr), mo = as.factor(mo))
trains_delayed <- open_dataset("data/trains_delayed") |> collect()
subway_stations <- open_dataset("data/subway_stations") |> collect()
alerts_b_24_25 <- open_dataset("data/alerts_b_24_25") |> collect()
schedule_c_d_q_select <- open_dataset("data/schedule_c_d_q_select") |> collect()

# b schedule dataset is huge, so we will query the parquet file as needed

Visualize runtimes

Let’s start by looking at the runtimes data for the B service. We are confining our analysis to 2024 and 2025. We will also focus on trips between the B’s typical terminals:

Brighton Beach (Station ID D40) aka BBC
145th St (off peak) (Station ID D13) aka 145
Bedford Park Boulevard (peak) (Station ID D03) aka BPK

Code

runtimes_b <- end_to_end_runtimes |>
  filter(line == "B" & month >= as.POSIXct("2024-01-01")) |>
  mutate(
    church_ave_work = month %within% interval(
      ymd("2024-08-01"), ymd("2025-02-28")
    )
  ) |>
  arrange(month, time_period, direction)

# filter out short turns
runtimes_b_full_bpb <- runtimes_b |>
  filter(
    (origin_station_name == "BRIGHTON BEACH" &
      destination_station_name == "BEDFORD PARK BLVD") |
    (origin_station_name == "BEDFORD PARK BLVD" &
      destination_station_name == "BRIGHTON BEACH")
  )

runtimes_b_full_145 <- runtimes_b |>
  filter(
    (origin_station_name == "BRIGHTON BEACH" &
      destination_station_name == "145TH STREET") |
    (origin_station_name == "145TH STREET" &
      destination_station_name == "BRIGHTON BEACH")
  )

There seems to be variation within OD pairs in terms of the number of stops served. Most seem to serve 37 stops (BPB) or 27 stops (145) but there is a subset of each with 6 additional stops.

Distribution of number of stops entries (145):

Code

table(runtimes_b_full_145$number_of_stops) |> knitr::kable()

Var1	Freq
26	3
27	126
29	1
33	53
35	4

Distribution of number of stops entries (BPB):

Code

table(runtimes_b_full_bpb$number_of_stops) |> knitr::kable()

Var1	Freq
36	4
37	147
42	3
43	58
45	5

Trends in average runtimes and service delivery, 2024 vs 2025

Code

# plot 2024 vs 2025 peak, broken down by direction and time of day
runtimes_b |>
  filter(number_of_stops %in% c(37, 43) & time_period %in% c("AM peak", "PM peak")) |>
  ggplot(aes(x = mo, y = average_actual_runtime, fill = yr)) +
  geom_col(
    position = position_dodge(preserve = "single"),
    width = 0.75
    ) +
  scale_x_discrete(labels = month.abb) +
  scale_fill_brewer(palette = "Set2") +
  # scale_alpha_manual(
  #   aes(alpha = church_ave_work)
  #   values = c(1, 0.25),
  #   labels = c("No work", "Church Ave work"),
  #   guide = "none"
  # ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank(),
    legend.position = "bottom"
    ) +
  facet_grid(rows = vars(direction), cols = vars(time_period)) +
  labs(
    title = "NYCT B Runtimes at Peak, 2024 vs 2025",
    y = "Average actual end-to-end runtime (minutes)",
    fill = "Year",
    caption = "Note: B train service was disrupted from Aug 2024-Feb 2025 due to accessibility work at Church Ave.\nB service ran local between Kings Highway and Prospect Park and did not stop at Church Ave."
    )

# plot 2024 vs 2025 off peak, broken down by direction and time of day

# note not many showing because 2024 esp mostly didn't do 37-stop runs
runtimes_b |>
  filter(number_of_stops %in% c(27, 33) & time_period %in% c("midday", "evening")) |>
  ggplot(aes(x = mo, y = average_actual_runtime, fill = yr)) +
  geom_col(
    position = position_dodge(preserve = "single"),
    width = 0.75
    ) +
  scale_x_discrete(labels = month.abb) +
  scale_fill_brewer(palette = "Set1") +
  # scale_alpha_manual(
  #   aes(alpha = church_ave_work),
  #   values = c(1, 0.25),
  #   labels = c("No work", "Church Ave work"),
  #   guide = "none"
  # ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank(),
    legend.position = "bottom"
    ) +
  facet_grid(rows = vars(direction), cols = vars(time_period)) +
  labs(
    title = "NYCT B Runtimes Off Peak, 2024 vs 2025",
    y = "Average actual end-to-end runtime (minutes)",
    fill = "Year",
    caption = "Note: B train service was disrupted from Aug 2024-Feb 2025 due to accessibility work at Church Ave.\nB service ran local between Kings Highway and Prospect Park and did not stop at Church Ave."
    )

Code

# pivot longer to compare service delivery
trains_b_sched_vs_act_long <- runtimes_b |>
  pivot_longer(
    ends_with("_trains"),
    names_to = "type",
    values_to = "n_trains",
    names_pattern = "(.*)_trains"
  ) |>
  mutate(type = factor(type))

# plot 2024 vs 2025 peak, broken down by direction and time of day
trains_b_sched_vs_act_long |>
  filter(number_of_stops %in% c(37, 43) & time_period %in% c("AM peak", "PM peak")) |>
  summarise(n_trains = sum(n_trains), .by = c(yr, mo, type, time_period, church_ave_work)) |>
  ggplot(aes(x = mo, y = n_trains, fill = type, alpha = church_ave_work)) +
  geom_col(
    position = "identity",
    width = 0.75,
    color = "black"
    ) +
  scale_x_discrete(labels = month.abb) +
  scale_fill_manual(values = c(actual = "#FF6319", scheduled = NA)) +
  scale_alpha_manual(
    values = c(1, 0.25),
    labels = c("No work", "Church Ave work"),
    guide = "none"
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank(),
    legend.position = "bottom"
    ) +
  facet_grid(rows = vars(time_period), cols = vars(yr)) +
  labs(
    title = "NYCT B Service Delivered at Peak, 2024 vs 2025",
    y = "Peak trains per month",
    fill = "Type",
    caption = "Note: B train service was disrupted from Aug 2024-Feb 2025 due to accessibility work at Church Ave.\nB service ran local between Kings Highway and Prospect Park and did not stop at Church Ave."
    )

Code

b_service_delivered_peak <- runtimes_b |>
  filter(number_of_stops %in% c(37, 43) & time_period %in% c("AM peak", "PM peak")) |>
  drop_na() |>
  group_by(yr, mo, direction, time_period) |>
  slice_max(number_of_stops) |>
  summarise(
    prop_of_scheduled_trains = actual_trains / scheduled_trains,
    pct_of_scheduled_trains = paste0(round(prop_of_scheduled_trains * 100, 0), "%"),
    .groups = "drop"
  )

b_service_delivered_peak |>
  select(!prop_of_scheduled_trains) |>
  head(20) |>
  knitr::kable(align = "llllr")

Table 1: B service delivery at peak, first 20 rows

yr	mo	direction	time_period	pct_of_scheduled_trains
2024	1	N	AM peak	87%
2024	1	N	PM peak	86%
2024	1	S	AM peak	83%
2024	1	S	PM peak	86%
2024	2	N	AM peak	84%
2024	2	N	PM peak	84%
2024	2	S	AM peak	88%
2024	2	S	PM peak	85%
2024	3	N	AM peak	84%
2024	3	N	PM peak	83%
2024	3	S	AM peak	87%
2024	3	S	PM peak	86%
2024	4	N	AM peak	89%
2024	4	N	PM peak	93%
2024	4	S	AM peak	90%
2024	4	S	PM peak	90%
2024	5	N	AM peak	86%
2024	5	N	PM peak	85%
2024	5	S	AM peak	87%
2024	5	S	PM peak	89%

Through these visualizations, I discovered that B service was disrupted from August 6, 2024 to February 24, 2025. After some Googling, I learned that the MTA conducted accessibility upgrades at the Church Ave BMT Brighton Line station during this period, which meant that B trains were diverted to the local tracks and thus made additional stops at a net of 6 extra stations between Kings Highway and Prospect Park. (Trains did not stop at Church Ave.) B express service resumed on February 24, 2025.

Trends in runtime distribution and performance against schedule

Peak period

Code

# let's look at months when regular service was running: March to July
runtimes_b_box <- runtimes_b |>
  filter(
    number_of_stops %in% c(37, 43) &
      time_period %in% c("AM peak", "PM peak") &
      mo %in% 3:7
  ) |>
  rename(
    avg_actual = average_actual_runtime,
    avg_sched  = average_scheduled_runtime,
    q25 = `_25th_percentile_runtime`,
    q50 = `_50th_percentile_runtime`,
    q75 = `_75th_percentile_runtime`
  ) |>
  mutate(xpos = as.numeric(mo))

dodge <- position_dodge(width = 0.7)

ggplot(runtimes_b_box, aes(x = xpos)) +
  # IQR boxes
  geom_rect(
    aes(
      xmin = xpos - 0.3,
      xmax = xpos + 0.3,
      ymin = q25,
      ymax = q75,
      fill = yr
    ),
    color = "black",
    position = dodge
  ) +
  # median lines
  geom_rect(
    aes(
      xmin = xpos - 0.3,
      xmax = xpos + 0.3,
      ymin = q50 - 0.08,
      ymax = q50 + 0.08,
      group = yr
    ),
    fill = "black",
    position = dodge
  ) +
  # average actual → X symbol
  geom_point(
    aes(y = avg_actual, shape = "Average actual runtime", group = yr),
    size = 2.2,
    stroke = 0.9,
    position = dodge
  ) +
  # scheduled → O symbol
  geom_point(
    aes(y = avg_sched, shape = "Scheduled runtime", group = yr),
    size = 2.2,
    stroke = 0.9,
    position = dodge
  ) +
  # scales and annotations
  scale_shape_manual(
    values = c("Average actual runtime" = 4, "Scheduled runtime" = 1)
  ) +
  scale_fill_brewer(palette = "Set2") +
  scale_x_continuous(labels = \(x) month.abb[x]) +
  scale_y_continuous(limits = c(80,95)) +
  labs(
    title = "NYCT B Runtime vs Schedule at Peak, 2024 vs 2025",
    y = "End-to-end runtime (minutes)",
    x = NULL,
    fill = "Year",
    shape = NULL
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "bottom"
  ) +
  facet_grid(rows = vars(direction), cols = vars(time_period))

Figure 4 reveals several trends during the peak period:

Northbound actual travel times are relatively consistent across the AM and PM peaks, whereas the southbound direction exhibits a pronounced running time increase in the PM peak.
Overall, southbound scheduled running times were more conservative, although this effect was more pronounced in 2024 than 2025.
PM peak northbound scheduled running times were too aggressive in both 2024 and 2025: they were shorter than even the 25th percentile of trips. Scheduled running times were increased starting in June 2025, but actual running times increased concurrently.
Southbound running times in the PM peak were better scheduled than northbound, but still were consistently more aggressive than even the average and median runtimes in both 2024 and 2025. Runtime appears to have marginally improved from 2024 to 2025, resulting in a slight improvement in the discrepancy between scheduled and actual average runtime.

We can now zero in on northbound train performance in the PM peak, since it appeared to be the greatest problem area.

Code

runtimes_b_box_pm_peak_nb <- runtimes_b |>
  filter(
    number_of_stops %in% c(37, 43) &
      time_period == "PM peak" &
      direction == "N"
  ) |>
  # keep 43 stop variation for shoulder months of church_ave_work
  slice_max(number_of_stops, by = month) |>
  rename(
    avg_actual = average_actual_runtime,
    avg_sched  = average_scheduled_runtime,
    q25 = `_25th_percentile_runtime`,
    q50 = `_50th_percentile_runtime`,
    q75 = `_75th_percentile_runtime`
  ) |>
  mutate(xpos = as.numeric(mo))

# half-width of each box (in days)
box_width <- 12

ggplot(runtimes_b_box_pm_peak_nb, aes(x = month, alpha = church_ave_work)) +
  # IQR boxes
  geom_rect(
    aes(
      xmin = month - days(box_width),
      xmax = month + days(box_width),
      ymin = q25,
      ymax = q75,
      fill = yr
    ),
    # fill = "lightgray",
    color = "black"
  ) +
  # Median line
  geom_segment(
    aes(
      x = month - days(box_width),
      xend = month + days(box_width),
      y = q50,
      yend = q50
    ),
    color = "black",
    linewidth = 0.6
  ) +
  # Average actual → X
  geom_point(
    aes(y = avg_actual, shape = "Average actual runtime"),
    size = 2.5,
    stroke = 0.9
  ) +
  # Scheduled → O
  geom_point(
    aes(y = avg_sched, shape = "Scheduled runtime"),
    size = 2.5,
    stroke = 0.9
  ) +
  scale_shape_manual(
    values = c("Average actual runtime" = 4,
               "Scheduled runtime" = 1)
  ) +
  scale_x_date(
    date_labels = "%b '%y",    # e.g., Mar '24
    date_breaks = "1 month",
    expand = c(0.02,0.02)
  ) +
  scale_fill_brewer(palette = "Set2") +
  scale_alpha_manual(
    values = c(1, 0.25),
    labels = c("No work", "Church Ave work"),
    guide = "none"
  ) +
  labs(
    title = "NYCT B PM Peak Northbound Runtime Distribution",
    y = "End-to-end runtime (minutes)",
    fill = "Year",
    x = NULL,
    shape = NULL,
    caption = "Note: B train service was disrupted from Aug 2024-Feb 2025 due to accessibility work at Church Ave.\nB service ran local between Kings Highway and Prospect Park and did not stop at Church Ave."
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "bottom"
  )

By zeroing in on PM peak northbound runs, we can see that the trend of increased runtimes continued after July 2025. The adjusted schedules did manage to more closely reflect operational reality, with scheduled runtimes hovering around the 25th percentile of actual, but additional corrective action would likely be warranted, either on the scheduling or operations side.

Now, let’s see what’s happening off peak.

Off-peak periods

While the B does not operate on nights or weekends, it does operate during middays and evenings, when crowding and operational conditions may differ.

Code

# let's look at months when regular service was running: March to July
runtimes_b_box_off_peak <- runtimes_b |>
  filter(
    stop_path_id %in% c("B-N-D40-D13-1", "B-S-D13-D40-2"),
      time_period %in% c("midday", "evening") &
      mo %in% 3:7
  ) |>
  rename(
    avg_actual = average_actual_runtime,
    avg_sched  = average_scheduled_runtime,
    q25 = `_25th_percentile_runtime`,
    q50 = `_50th_percentile_runtime`,
    q75 = `_75th_percentile_runtime`
  ) |>
  mutate(xpos = as.numeric(mo))

dodge <- position_dodge(width = 0.7)

ggplot(runtimes_b_box_off_peak, aes(x = xpos)) +
  # IQR boxes
  geom_rect(
    aes(
      xmin = xpos - 0.3,
      xmax = xpos + 0.3,
      ymin = q25,
      ymax = q75,
      fill = yr
    ),
    color = "black",
    position = dodge
  ) +
  # median lines
  geom_rect(
    aes(
      xmin = xpos - 0.3,
      xmax = xpos + 0.3,
      ymin = q50 - 0.08,
      ymax = q50 + 0.08,
      group = yr
    ),
    fill = "black",
    position = dodge
  ) +
  # average actual → X symbol
  geom_point(
    aes(y = avg_actual, shape = "Average actual runtime", group = yr),
    size = 2.2,
    stroke = 0.9,
    position = dodge
  ) +
  # scheduled → O symbol
  geom_point(
    aes(y = avg_sched, shape = "Scheduled runtime", group = yr),
    size = 2.2,
    stroke = 0.9,
    position = dodge
  ) +
  # scales and annotations
  scale_shape_manual(
    values = c("Average actual runtime" = 4, "Scheduled runtime" = 1)
  ) +
  scale_fill_brewer(palette = "Set1") +
  scale_x_continuous(labels = \(x) month.abb[x]) +
  scale_y_continuous(limits = c(60,75)) +
  labs(
    title = "NYCT B Runtime vs Schedule Off Peak, 2024 vs 2025",
    y = "End-to-end runtime (minutes)",
    x = NULL,
    fill = "Year",
    shape = NULL
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "bottom"
  ) +
  facet_grid(rows = vars(direction), cols = vars(time_period))

Off-peak, we see somewhat smaller variation in runtimes. We also see a consistent, if mild, improvement (decrease) in runtime variability from 2024 to 2025. Scheduling improvements are clearer off-peak as well. In particular, evening northbound runs received increased scheduled runtimes in 2025 that brought the schedule within the interquartile range of actual runs. However, similarly to the peak period, northbound runs were overall less adherent to the schedule than southbound runs.

Delayed trains

Let’s take a look at the trains_delayed dataset. From the documentation, a train is considered delayed if…

it arrives at its destination terminal more than five minutes late, if it did not make any scheduled station stops, or if it was scheduled to run but did not operate.

Therefore, this dataset will give us a sense of delays that were severe enough to affect runtimes (which, as we have seen above, appear to have occurred regularly!) but will not give us granularity in terms of where the delays occurred along the line.

Code

# use custom ordering for delay categories to maximize visual effectiveness
delay_category_order <- c(
  "External Factors",
  "Crew Availability",
  "Operating Conditions",
  "Planned ROW Work",
  "Police & Medical",
  "Infrastructure & Equipment"
)

delays_b_24_25 <- trains_delayed |>
  filter(month >= as.Date("2024-01-01") & line == "B") |>
  select(month, reporting_category, delays, yr, mo) |>
  mutate(
    reporting_category = factor(reporting_category, levels = delay_category_order)
  )

delays_l_24_25 <- trains_delayed |>
  filter(
    month >= as.Date("2024-01-01") &
    line == "L" &
    day_type == "weekday"
  ) |>
  select(month, reporting_category, delays, yr, mo) |>
  mutate(
    reporting_category = factor(reporting_category, levels = delay_category_order)
  )

breaks_2_mo <- unique(delays_b_24_25$month)[c(TRUE, FALSE)]

Code

delays_b_24_25 |>
  ggplot(aes(x = month, y = delays, fill = reporting_category)) +
  geom_area(position = "stack") +
  scale_fill_brewer(palette = "Set3") +
  scale_x_date(
    date_labels = "%b '%y",
    breaks = breaks_2_mo,
    expand = c(0.02,0.02)
    ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank()
    ) +
  labs(
    title = "NYCT B Train Delays by Category, 2024–25",
    y = "Delays per month",
    fill = "Category"
  )

Code

# same chart but proportional area
delays_b_24_25 |>
  ggplot(aes(x = month, y = delays, fill = reporting_category)) +
  geom_area(position = "fill") +
  scale_fill_brewer(palette = "Set3") +
  scale_x_date(
    date_labels = "%b '%y",
    breaks = breaks_2_mo,
    expand = c(0.02,0.02)
    ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank()
    ) +
  labs(
    title = "NYCT B Train Delays by Category, 2024–25",
    y = "Proportion of delays per month",
    fill = "Category"
  )

Several delay types are presented in the dataset, allowing us to zero in on issues that may be within the purview of the PAU to fix. For the purpose of this analysis, we will focus on delays due to Operating Conditions, which encompass “delays due to congestion, crowding, and from trains skipping stops to manage other delays.”

Code

# same chart but proportional area
delays_l_24_25 |>
  ggplot(aes(x = month, y = delays, fill = reporting_category)) +
  geom_area(position = "fill") +
  scale_fill_brewer(palette = "Set3") +
  scale_x_date(
    date_labels = "%b '%y",
    breaks = breaks_2_mo,
    expand = c(0.02,0.02)
    ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank()
    ) +
  labs(
    title = "NYCT L Train Delays by Category, 2024–25 weekdays",
    y = "Proportion of delays per month",
    fill = "Category"
  )

Code

delays_b_l <- bind_rows(delays_b_24_25, delays_l_24_25, .id = "line") |>
  mutate(line = case_match(line, "1" ~ "B train (110 tpd)", "2" ~ "L train (277 tpd)")) |>
  crossing(type = c("Total", "Proportional"))

delays_b_l |>
  ggplot(aes(x = month, y = delays, fill = reporting_category)) +
  geom_area(
    data = ~ filter(.x, type == "Total"),
    position = "stack"
  ) +
  geom_area(
    data = ~ filter(.x, type == "Proportional"),
    position = "fill"
  ) +
  scale_fill_brewer(palette = "Set3") +
  scale_x_date(
    date_labels = "%b '%y",
    date_breaks = "3 months",
    # breaks = breaks_2_mo,
    expand = c(0.02,0.02)
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank(),
    strip.placement = "outside",
    strip.background.y = element_blank()
  ) +
  labs(
    title = "NYCT B and L Train Delays by Category, 2024–25 weekdays",
    y = "Delays per month",
    fill = "Category"
  ) +
  facet_grid(
    rows = vars(type),
    cols = vars(line),
    scales = "free_y",
    switch = "y"
  )

By comparing the B train’s sources of delay to those of the L train in Figure 10, which is operationally independent from all other subway lines, we can see the effect of the B’s extensive interlining: Operating Conditions comprise a much greater proportion of its delays.

Let’s take a look at the B’s schedules to see if we can discern any particular problem spots related to interlining.

Evaluating the schedule

Let’s begin by focusing on northbound B runs during the PM peak, which we identified as especially problematic above. We can find a characteristic run from May 2025 and July 2025, before and after schedule adjustments improved on-time performance, to see where gains may have been made.

So that we can compare apples to apples, we can first check the MTA’s alerts archive to find days where there were no alerts for B service.

Code

# get days that service ran
# query parquet dataset
service_dates_b <- open_dataset("data/schedule_b_24_25") |>
  filter(yr == 2025) |>
  distinct(service_date) |>
  collect() |>
  pull()

# get days with alerts
alert_days_b <- alerts_b_24_25 |>
  mutate(service_date = date(date)) |>
  filter(service_date >= ymd("2025-01-01")) |>
  distinct(service_date) |>
  arrange(service_date) |>
  pull(service_date)

days_without_alerts_b <- sort(setdiff(service_dates_b, alert_days_b))

days_without_alerts_b

[1] "2025-02-28" "2025-03-31" "2025-04-23" "2025-04-30" "2025-05-01"
[6] "2025-06-30" "2025-07-31" "2025-09-30"

Lucky us! May 1 and July 31 are both Thursdays, so let’s try to compare them.

Code

schedules_b_thursday <- open_dataset("data/schedule_b_24_25") |>
  select(!starts_with(":")) |>
  select(!c(service_code, trip_line, division)) |>
  filter(
    service_date == as.Date("2025-05-01") |
    service_date == as.Date("2025-07-31")
  ) |>
  collect() |>
  arrange(service_date, direction, train_id, stop_order)

In order to evaluate the PM peak, let’s focus on trains leaving Brighton Beach in the 4:00 pm to 5:00 pm range. That’s the following train_ids, all making 40 stops, which thankfully are mostly consistent across both days:

train_id (May)	train_id (July)
0B 1606+ BBC/BPK	0B 1606+ BBC/BPK
	0B 1611 BBC/BPK
0B 1616 BBC/BPK	0B 1617 BBC/BPK
0B 1628+ BBC/BPK	0B 1628+ BBC/BPK
0B 1637 BBC/BPK	0B 1637 BBC/BPK
0B 1648+ BBC/BPK	0B 1648+ BBC/BPK
0B 1657 BBC/BPK	0B 1657 BBC/BPK

There were a few gtfs_stop_id entries in the schedules that were not represented in the subway_stations dataset. These all occurred at river crossings and appear to mark interlockings or simply the presence of a river crossing. I renamed them speculatively based on position— Q02 and D23 became Manhattan Bridge S and N, respectively, and D60 became Concourse Tunnel (where the IND Concourse Line crosses under the Harlem River north of 155 St Station).

Code

train_ids_b <- c(
  "0B 1628+ BBC/BPK",
  "0B 1637  BBC/BPK",
  "0B 1648+ BBC/BPK",
  "0B 1657  BBC/BPK"
)

schedules_b_1600_nb <- schedules_b_thursday |>
  filter(
    train_id %in% train_ids_b
  ) |>
  select(!next_trip_time) |>
  pivot_longer(
    cols = ends_with("_time"),
    names_to = "time_type",
    values_to = "time",
    names_pattern = "(.*)_time"
  ) |>
  mutate(time = hms::as_hms(time)) |>
  left_join(
    subway_stations |> select("stop_name", "gtfs_stop_id"),
    by = "gtfs_stop_id"
  ) |>
  mutate(stop_name = case_match(
    gtfs_stop_id,
    "Q02" ~ "Manhattan Bridge S",
    "D23" ~ "Manhattan Bridge N",
    "D60" ~ "Concourse Tunnel",
    "D14" ~ "7 Av-53 St",
    .default = stop_name
  )) |>
  mutate(stop_name = as_factor(stop_name))

schedules_b_1600_nb |>
  # group_by(train_id, service_date) |>
  # filter(service_date == ymd("2025-07-31")) |>
  ggplot(aes(x = time, y = stop_name)) +
  geom_line(
    aes(
      group = interaction(train_id, service_date),
      color = as.factor(service_date)
    ),
    linewidth = 1
  ) +
  geom_point(size = 1) +
  scale_color_brewer(name = "Date", palette = "Accent") +
  scale_x_time(
    name = NULL,
    labels = \(x) format(as_datetime(x, tz = "UTC"), "%H:%M")
  ) +
  labs(
    title = "B Schedule Change Added 2-Minute Pause at 145 St",
    subtitle = "Selected scheduled northbound B trains, May 1 vs July 31, 2025",
    x = NULL,
    y = NULL
  ) +
  theme(
    legend.position = c(0.85,0.2)
  )

The stringlines in Figure 11 reveal that the schedule change to the B service that took place around June 2025 appears to have addressed OTP surgically by adding a 2-minute pause in the schedule at 145 St. This pause may have been added to facilitate cross-platform transfers with northbound D express trains¹ or simply to allow delay recovery at the first “out-of-the-way” station in Manhattan where northbound B trains do not block other lines.

Additional padding was added in the Bronx, with the 1.5-minute intervals between Tremont Rd and 182–183 Sts and Kingsbridge Rd and Bedford Park Blvd increased to 2 minutes, stretching the Concourse Line run from 18 minutes to 19 minutes. Otherwise, the schedule appeared largely the same.

In order to better understand how these B trains interact with other lines, let’s add its interlined services to our stringline chart, starting with the D. For ease of reading, we will omit stops in Brooklyn south of DeKalb Av.

Code

train_ids_d <- c(
  "0D 1617  STL/205", # July 31
  "0D 1619  STL/205", # May 1
  "0D 1623+ STL/205", # July 31
  "0D 1625+ STL/205", # May 1
  "0D 1629+ STL/205", # July 31
  "0D 1633  STL/205", # May 1
  "0D 1635+ STL/205", # July 31
  "0D 1643+ STL/205", # May 1
  "0D 1644  STL/205" # July 31
  # "0D 1649+ STL/205", # May 1
  # "0D 1652+ STL/205"  # July 31
)

schedules_d_1600_nb <- schedule_c_d_q_select |>
  filter(train_id %in% train_ids_d) |>
  select(!c(service_code, trip_line, division)) |>
  select(!next_trip_time) |>
  pivot_longer(
    cols = ends_with("_time"),
    names_to = "time_type",
    values_to = "time",
    names_pattern = "(.*)_time"
  ) |>
  mutate(time = hms::as_hms(time)) |>
  left_join(
    subway_stations |> select("stop_name", "gtfs_stop_id"),
    by = "gtfs_stop_id"
  ) |>
  mutate(stop_name = case_match(
    gtfs_stop_id,
    "Q02" ~ "Manhattan Bridge S",
    "D23" ~ "Manhattan Bridge N",
    "D60" ~ "Concourse Tunnel",
    "B24" ~ "Coney Island Creek Bridge",
    "D14" ~ "7 Av-53 St",
    .default = stop_name
  )) |>
  arrange(service_date, direction, train_id, stop_order) |>
  mutate(stop_name = as_factor(stop_name))

schedules_b_mod <- schedules_b_1600_nb |>
  filter(stop_order > 8) |>
  mutate(stop_order_b_d = stop_order - 8) |>
  mutate(across(c(yr, mo, day), as.factor))

schedules_d_mod <- schedules_d_1600_nb |>
  filter(stop_order > 16) |>
  left_join(
    schedules_b_mod |> select(gtfs_stop_id, stop_order_b_d),
    by = "gtfs_stop_id",
    multiple = "first"
  ) |>
  mutate(stop_order_b_d = case_match(stop_name, "Norwood-205 St" ~ 32, .default = stop_order_b_d))

schedules_b_and_d <- bind_rows(schedules_b_mod, schedules_d_mod) |>
  mutate(
    stop_name = factor(
      stop_name,
      levels = unique(stop_name[order(stop_order_b_d)])
    )
  )

schedules_b_and_d |>
  filter(service_date == ymd("2025-07-31")) |>
  ggplot(aes(x = time, y = stop_name)) +
  geom_line(
    aes(
      group = interaction(train_id, service_date),
      color = line
    ),
    linewidth = 1
  ) +
  geom_point(size = 1) +
  scale_color_manual(name = "Service", values = c("#1f78b4","#fb9a99")) +
  scale_x_time(
    name = NULL,
    labels = \(x) format(as_datetime(x, tz = "UTC"), "%H:%M")
  ) +
  labs(
    title = "B and D Follow Closely after Merge at DeKalb",
    subtitle = "Selected scheduled northbound B and D trains, July 31, 2025",
    x = NULL,
    y = NULL,
    caption = "D trains do not stop at DeKalb Av, but it appears in the schedule.\nStops south of DeKalb Av not shown."
  ) +
  theme(
    legend.position = c(0.9,0.2)
  )

The B pause added at 145 St does not appear to be related to a cross-platform transfer. If it were, D trains would arrive at 145 St during the pause. Instead, D trains do not seem coordinated to arrive at 145 St directly in conjunction with B trains.

It is also worth noting that while B and D trains are scheduled to depart DeKalb Av at different times,² facilitating their merge, some of the gaps are quite small, such as the 17:00 B which is followed 30 seconds later by a D. Such tight gaps create fragility in the schedule— if the D train cannot follow the B as closely as scheduled, (likely, with the grade timer limitations on the Manhattan Bridge) it may be late for its merge with the A train south of 59 St-Columbus Circle.

While CBTC installation may facilitate smoother operations through DeKalb Interlocking and over the Manhattan Bridge, it is difficult to look past deinterlining as a more robust and effective solution to the B train’s woes.

Footnotes

It is worth noting that while northbound B and D trains merge south of 145 St at most times, northbound PM peak D trains take the express track on the Concourse Line and thus avoid a merge with the northbound B at 145 St during this period. The same is true for southbound trains during the AM peak.↩︎
While D trains do not make a platform stop at DeKalb Av, it is still reflected in the D schedule as a timepoint.↩︎