NYCT Subway B Service: Data Exploration

Author

Seamus Joyce-Johnson

Published

January 23, 2026

Background

The B Sixth Avenue Express/Concourse Local/Brighton Local is a New York City Subway service that operates on weekdays (daytime) only. The B typically serves 27 stations from Brighton Beach in the south to 145th St in the north. Trains extend 10 additional stations north to Bedford Park Boulevard at rush hour. It is an express service in Brooklyn and Lower Manhattan and a local service in Upper Manhattan (north of 59th St) and the Bronx. The lines served and key merge points are as follows (south to north):

  • BMT Brighton Line (4 tracks, B uses inner express tracks and Q uses outer local tracks) starting at Brighton Beach
  • Prospect Park: between Parkside Av and Prospect Park stations, the Brighton Line narrows to two tracks and B and Q trains share tracks (northbound B and Q trains merge north of Parkside), using the outer tracks at DeKalb Av Station.
  • DeKalb Interlocking: this infamous merge reshuffles the B, D, N, and Q trains before heading over the Manhattan Bridge. The B and D services use the easternmost two tracks on the bridge to access the 6th Avenue Line. Northbound B trains can thus stay on the outermost track through the interlocking (and merge with D trains), but southbound B trains must cross over and merge with Q (and select N) trains.
  • Manhattan Bridge and Chrystie Street Connection (2 tracks, shared with D)
  • IND Sixth Avenue Line (4 tracks, B and D use inner express tracks and F and M use outer local tracks) from Chrystie Street Connection to 47th-50th Sts–Rockefeller Center
  • North of 47th-50th Sts–Rockefeller Center, the express tracks join the Eighth Avenue Line in a flying junction, requiring northbound B trains to merge with C trains on the local track and southbound B trains to merge with D trains on the express tracks.
  • The B continues as a local service (shared with C trains) on the Eighth Avenue Line until 135th St
  • Between 135th and 145th St stations, southbound B trains must merge with southbound C trains. Northbound B trains must merge with northbound D trains (except during the peak).
  • At 145th St (a three-track station), B trains short turn using the middle track outside of the peak. Departing trains must merge with southbound D trains.
  • IND Concourse Line: peak hour B trains continue north into the Bronx on the outer local tracks.
  • Bedford Park Boulevard: B trains terminate and reverse using the center track.

Merge points summary:

  • Northbound
    • Parkside/Prospect Park merge with Q train
    • DeKalb interlocking merge with D train
    • Columbus Circle merge with C train
    • 135th/145th merge with D train (off-peak)
  • Southbound
    • 145 St short-turn departing trains merge with D train (off-peak)
    • 145th/135th merge with C train
    • Columbus Circle merge with D train
    • DeKalb interlocking merge with Q train

Import data

Code
# note: run 0_import_data script first to download data locally

# these three are pretty small, so we can load them into memory
end_to_end_runtimes <- open_dataset("data/end_to_end_runtimes") |>
  collect() |>
  mutate(yr = as.factor(yr), mo = as.factor(mo))
trains_delayed <- open_dataset("data/trains_delayed") |> collect()
subway_stations <- open_dataset("data/subway_stations") |> collect()
alerts_b_24_25 <- open_dataset("data/alerts_b_24_25") |> collect()
schedule_c_d_q_select <- open_dataset("data/schedule_c_d_q_select") |> collect()

# b schedule dataset is huge, so we will query the parquet file as needed

Visualize runtimes

Let’s start by looking at the runtimes data for the B service. We are confining our analysis to 2024 and 2025. We will also focus on trips between the B’s typical terminals:

  • Brighton Beach (Station ID D40) aka BBC
  • 145th St (off peak) (Station ID D13) aka 145
  • Bedford Park Boulevard (peak) (Station ID D03) aka BPK
Code
runtimes_b <- end_to_end_runtimes |>
  filter(line == "B" & month >= as.POSIXct("2024-01-01")) |>
  mutate(
    church_ave_work = month %within% interval(
      ymd("2024-08-01"), ymd("2025-02-28")
    )
  ) |>
  arrange(month, time_period, direction)

# filter out short turns
runtimes_b_full_bpb <- runtimes_b |>
  filter(
    (origin_station_name == "BRIGHTON BEACH" &
      destination_station_name == "BEDFORD PARK BLVD") |
    (origin_station_name == "BEDFORD PARK BLVD" &
      destination_station_name == "BRIGHTON BEACH")
  )

runtimes_b_full_145 <- runtimes_b |>
  filter(
    (origin_station_name == "BRIGHTON BEACH" &
      destination_station_name == "145TH STREET") |
    (origin_station_name == "145TH STREET" &
      destination_station_name == "BRIGHTON BEACH")
  )

There seems to be variation within OD pairs in terms of the number of stops served. Most seem to serve 37 stops (BPB) or 27 stops (145) but there is a subset of each with 6 additional stops.

Distribution of number of stops entries (145):

Code
table(runtimes_b_full_145$number_of_stops) |> knitr::kable()
Var1 Freq
26 3
27 126
29 1
33 53
35 4

Distribution of number of stops entries (BPB):

Code
table(runtimes_b_full_bpb$number_of_stops) |> knitr::kable()
Var1 Freq
36 4
37 147
42 3
43 58
45 5

Delayed trains

Let’s take a look at the trains_delayed dataset. From the documentation, a train is considered delayed if…

it arrives at its destination terminal more than five minutes late, if it did not make any scheduled station stops, or if it was scheduled to run but did not operate.

Therefore, this dataset will give us a sense of delays that were severe enough to affect runtimes (which, as we have seen above, appear to have occurred regularly!) but will not give us granularity in terms of where the delays occurred along the line.

Code
# use custom ordering for delay categories to maximize visual effectiveness
delay_category_order <- c(
  "External Factors",
  "Crew Availability",
  "Operating Conditions",
  "Planned ROW Work",
  "Police & Medical",
  "Infrastructure & Equipment"
)

delays_b_24_25 <- trains_delayed |>
  filter(month >= as.Date("2024-01-01") & line == "B") |>
  select(month, reporting_category, delays, yr, mo) |>
  mutate(
    reporting_category = factor(reporting_category, levels = delay_category_order)
  )

delays_l_24_25 <- trains_delayed |>
  filter(
    month >= as.Date("2024-01-01") &
    line == "L" &
    day_type == "weekday"
  ) |>
  select(month, reporting_category, delays, yr, mo) |>
  mutate(
    reporting_category = factor(reporting_category, levels = delay_category_order)
  )

breaks_2_mo <- unique(delays_b_24_25$month)[c(TRUE, FALSE)]
Code
delays_b_24_25 |>
  ggplot(aes(x = month, y = delays, fill = reporting_category)) +
  geom_area(position = "stack") +
  scale_fill_brewer(palette = "Set3") +
  scale_x_date(
    date_labels = "%b '%y",
    breaks = breaks_2_mo,
    expand = c(0.02,0.02)
    ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank()
    ) +
  labs(
    title = "NYCT B Train Delays by Category, 2024–25",
    y = "Delays per month",
    fill = "Category"
  )
Figure 7
Code
# same chart but proportional area
delays_b_24_25 |>
  ggplot(aes(x = month, y = delays, fill = reporting_category)) +
  geom_area(position = "fill") +
  scale_fill_brewer(palette = "Set3") +
  scale_x_date(
    date_labels = "%b '%y",
    breaks = breaks_2_mo,
    expand = c(0.02,0.02)
    ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank()
    ) +
  labs(
    title = "NYCT B Train Delays by Category, 2024–25",
    y = "Proportion of delays per month",
    fill = "Category"
  )
Figure 8

Several delay types are presented in the dataset, allowing us to zero in on issues that may be within the purview of the PAU to fix. For the purpose of this analysis, we will focus on delays due to Operating Conditions, which encompass “delays due to congestion, crowding, and from trains skipping stops to manage other delays.”

Code
# same chart but proportional area
delays_l_24_25 |>
  ggplot(aes(x = month, y = delays, fill = reporting_category)) +
  geom_area(position = "fill") +
  scale_fill_brewer(palette = "Set3") +
  scale_x_date(
    date_labels = "%b '%y",
    breaks = breaks_2_mo,
    expand = c(0.02,0.02)
    ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank()
    ) +
  labs(
    title = "NYCT L Train Delays by Category, 2024–25 weekdays",
    y = "Proportion of delays per month",
    fill = "Category"
  )
Figure 9
Code
delays_b_l <- bind_rows(delays_b_24_25, delays_l_24_25, .id = "line") |>
  mutate(line = case_match(line, "1" ~ "B train (110 tpd)", "2" ~ "L train (277 tpd)")) |>
  crossing(type = c("Total", "Proportional"))

delays_b_l |>
  ggplot(aes(x = month, y = delays, fill = reporting_category)) +
  geom_area(
    data = ~ filter(.x, type == "Total"),
    position = "stack"
  ) +
  geom_area(
    data = ~ filter(.x, type == "Proportional"),
    position = "fill"
  ) +
  scale_fill_brewer(palette = "Set3") +
  scale_x_date(
    date_labels = "%b '%y",
    date_breaks = "3 months",
    # breaks = breaks_2_mo,
    expand = c(0.02,0.02)
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_blank(),
    strip.placement = "outside",
    strip.background.y = element_blank()
  ) +
  labs(
    title = "NYCT B and L Train Delays by Category, 2024–25 weekdays",
    y = "Delays per month",
    fill = "Category"
  ) +
  facet_grid(
    rows = vars(type),
    cols = vars(line),
    scales = "free_y",
    switch = "y"
  )
Figure 10

By comparing the B train’s sources of delay to those of the L train in Figure 10, which is operationally independent from all other subway lines, we can see the effect of the B’s extensive interlining: Operating Conditions comprise a much greater proportion of its delays.

Let’s take a look at the B’s schedules to see if we can discern any particular problem spots related to interlining.

Evaluating the schedule

Let’s begin by focusing on northbound B runs during the PM peak, which we identified as especially problematic above. We can find a characteristic run from May 2025 and July 2025, before and after schedule adjustments improved on-time performance, to see where gains may have been made.

So that we can compare apples to apples, we can first check the MTA’s alerts archive to find days where there were no alerts for B service.

Code
# get days that service ran
# query parquet dataset
service_dates_b <- open_dataset("data/schedule_b_24_25") |>
  filter(yr == 2025) |>
  distinct(service_date) |>
  collect() |>
  pull()

# get days with alerts
alert_days_b <- alerts_b_24_25 |>
  mutate(service_date = date(date)) |>
  filter(service_date >= ymd("2025-01-01")) |>
  distinct(service_date) |>
  arrange(service_date) |>
  pull(service_date)

days_without_alerts_b <- sort(setdiff(service_dates_b, alert_days_b))

days_without_alerts_b
[1] "2025-02-28" "2025-03-31" "2025-04-23" "2025-04-30" "2025-05-01"
[6] "2025-06-30" "2025-07-31" "2025-09-30"

Lucky us! May 1 and July 31 are both Thursdays, so let’s try to compare them.

Code
schedules_b_thursday <- open_dataset("data/schedule_b_24_25") |>
  select(!starts_with(":")) |>
  select(!c(service_code, trip_line, division)) |>
  filter(
    service_date == as.Date("2025-05-01") |
    service_date == as.Date("2025-07-31")
  ) |>
  collect() |>
  arrange(service_date, direction, train_id, stop_order)

In order to evaluate the PM peak, let’s focus on trains leaving Brighton Beach in the 4:00 pm to 5:00 pm range. That’s the following train_ids, all making 40 stops, which thankfully are mostly consistent across both days:

train_id (May) train_id (July)
0B 1606+ BBC/BPK 0B 1606+ BBC/BPK
0B 1611 BBC/BPK
0B 1616 BBC/BPK 0B 1617 BBC/BPK
0B 1628+ BBC/BPK 0B 1628+ BBC/BPK
0B 1637 BBC/BPK 0B 1637 BBC/BPK
0B 1648+ BBC/BPK 0B 1648+ BBC/BPK
0B 1657 BBC/BPK 0B 1657 BBC/BPK

There were a few gtfs_stop_id entries in the schedules that were not represented in the subway_stations dataset. These all occurred at river crossings and appear to mark interlockings or simply the presence of a river crossing. I renamed them speculatively based on position— Q02 and D23 became Manhattan Bridge S and N, respectively, and D60 became Concourse Tunnel (where the IND Concourse Line crosses under the Harlem River north of 155 St Station).

Code
train_ids_b <- c(
  "0B 1628+ BBC/BPK",
  "0B 1637  BBC/BPK",
  "0B 1648+ BBC/BPK",
  "0B 1657  BBC/BPK"
)

schedules_b_1600_nb <- schedules_b_thursday |>
  filter(
    train_id %in% train_ids_b
  ) |>
  select(!next_trip_time) |>
  pivot_longer(
    cols = ends_with("_time"),
    names_to = "time_type",
    values_to = "time",
    names_pattern = "(.*)_time"
  ) |>
  mutate(time = hms::as_hms(time)) |>
  left_join(
    subway_stations |> select("stop_name", "gtfs_stop_id"),
    by = "gtfs_stop_id"
  ) |>
  mutate(stop_name = case_match(
    gtfs_stop_id,
    "Q02" ~ "Manhattan Bridge S",
    "D23" ~ "Manhattan Bridge N",
    "D60" ~ "Concourse Tunnel",
    "D14" ~ "7 Av-53 St",
    .default = stop_name
  )) |>
  mutate(stop_name = as_factor(stop_name))

schedules_b_1600_nb |>
  # group_by(train_id, service_date) |>
  # filter(service_date == ymd("2025-07-31")) |>
  ggplot(aes(x = time, y = stop_name)) +
  geom_line(
    aes(
      group = interaction(train_id, service_date),
      color = as.factor(service_date)
    ),
    linewidth = 1
  ) +
  geom_point(size = 1) +
  scale_color_brewer(name = "Date", palette = "Accent") +
  scale_x_time(
    name = NULL,
    labels = \(x) format(as_datetime(x, tz = "UTC"), "%H:%M")
  ) +
  labs(
    title = "B Schedule Change Added 2-Minute Pause at 145 St",
    subtitle = "Selected scheduled northbound B trains, May 1 vs July 31, 2025",
    x = NULL,
    y = NULL
  ) +
  theme(
    legend.position = c(0.85,0.2)
  )
Figure 11

The stringlines in Figure 11 reveal that the schedule change to the B service that took place around June 2025 appears to have addressed OTP surgically by adding a 2-minute pause in the schedule at 145 St. This pause may have been added to facilitate cross-platform transfers with northbound D express trains1 or simply to allow delay recovery at the first “out-of-the-way” station in Manhattan where northbound B trains do not block other lines.

Additional padding was added in the Bronx, with the 1.5-minute intervals between Tremont Rd and 182–183 Sts and Kingsbridge Rd and Bedford Park Blvd increased to 2 minutes, stretching the Concourse Line run from 18 minutes to 19 minutes. Otherwise, the schedule appeared largely the same.

In order to better understand how these B trains interact with other lines, let’s add its interlined services to our stringline chart, starting with the D. For ease of reading, we will omit stops in Brooklyn south of DeKalb Av.

Code
train_ids_d <- c(
  "0D 1617  STL/205", # July 31
  "0D 1619  STL/205", # May 1
  "0D 1623+ STL/205", # July 31
  "0D 1625+ STL/205", # May 1
  "0D 1629+ STL/205", # July 31
  "0D 1633  STL/205", # May 1
  "0D 1635+ STL/205", # July 31
  "0D 1643+ STL/205", # May 1
  "0D 1644  STL/205" # July 31
  # "0D 1649+ STL/205", # May 1
  # "0D 1652+ STL/205"  # July 31
)

schedules_d_1600_nb <- schedule_c_d_q_select |>
  filter(train_id %in% train_ids_d) |>
  select(!c(service_code, trip_line, division)) |>
  select(!next_trip_time) |>
  pivot_longer(
    cols = ends_with("_time"),
    names_to = "time_type",
    values_to = "time",
    names_pattern = "(.*)_time"
  ) |>
  mutate(time = hms::as_hms(time)) |>
  left_join(
    subway_stations |> select("stop_name", "gtfs_stop_id"),
    by = "gtfs_stop_id"
  ) |>
  mutate(stop_name = case_match(
    gtfs_stop_id,
    "Q02" ~ "Manhattan Bridge S",
    "D23" ~ "Manhattan Bridge N",
    "D60" ~ "Concourse Tunnel",
    "B24" ~ "Coney Island Creek Bridge",
    "D14" ~ "7 Av-53 St",
    .default = stop_name
  )) |>
  arrange(service_date, direction, train_id, stop_order) |>
  mutate(stop_name = as_factor(stop_name))

schedules_b_mod <- schedules_b_1600_nb |>
  filter(stop_order > 8) |>
  mutate(stop_order_b_d = stop_order - 8) |>
  mutate(across(c(yr, mo, day), as.factor))

schedules_d_mod <- schedules_d_1600_nb |>
  filter(stop_order > 16) |>
  left_join(
    schedules_b_mod |> select(gtfs_stop_id, stop_order_b_d),
    by = "gtfs_stop_id",
    multiple = "first"
  ) |>
  mutate(stop_order_b_d = case_match(stop_name, "Norwood-205 St" ~ 32, .default = stop_order_b_d))

schedules_b_and_d <- bind_rows(schedules_b_mod, schedules_d_mod) |>
  mutate(
    stop_name = factor(
      stop_name,
      levels = unique(stop_name[order(stop_order_b_d)])
    )
  )

schedules_b_and_d |>
  filter(service_date == ymd("2025-07-31")) |>
  ggplot(aes(x = time, y = stop_name)) +
  geom_line(
    aes(
      group = interaction(train_id, service_date),
      color = line
    ),
    linewidth = 1
  ) +
  geom_point(size = 1) +
  scale_color_manual(name = "Service", values = c("#1f78b4","#fb9a99")) +
  scale_x_time(
    name = NULL,
    labels = \(x) format(as_datetime(x, tz = "UTC"), "%H:%M")
  ) +
  labs(
    title = "B and D Follow Closely after Merge at DeKalb",
    subtitle = "Selected scheduled northbound B and D trains, July 31, 2025",
    x = NULL,
    y = NULL,
    caption = "D trains do not stop at DeKalb Av, but it appears in the schedule.\nStops south of DeKalb Av not shown."
  ) +
  theme(
    legend.position = c(0.9,0.2)
  )
Figure 12

The B pause added at 145 St does not appear to be related to a cross-platform transfer. If it were, D trains would arrive at 145 St during the pause. Instead, D trains do not seem coordinated to arrive at 145 St directly in conjunction with B trains.

It is also worth noting that while B and D trains are scheduled to depart DeKalb Av at different times,2 facilitating their merge, some of the gaps are quite small, such as the 17:00 B which is followed 30 seconds later by a D. Such tight gaps create fragility in the schedule— if the D train cannot follow the B as closely as scheduled, (likely, with the grade timer limitations on the Manhattan Bridge) it may be late for its merge with the A train south of 59 St-Columbus Circle.

While CBTC installation may facilitate smoother operations through DeKalb Interlocking and over the Manhattan Bridge, it is difficult to look past deinterlining as a more robust and effective solution to the B train’s woes.

Footnotes

  1. It is worth noting that while northbound B and D trains merge south of 145 St at most times, northbound PM peak D trains take the express track on the Concourse Line and thus avoid a merge with the northbound B at 145 St during this period. The same is true for southbound trains during the AM peak.↩︎

  2. While D trains do not make a platform stop at DeKalb Av, it is still reflected in the D schedule as a timepoint.↩︎