Tiebreaker format comparisons: Battlefy vs. Sendou

Author

glumbaron

Published

December 19, 2024

Introduction

In December 2024, IPL switched from Battlefy to sendou.ink for Low Ink, our large skill-capped monthly tournament. This largely went well, with some wrinkles here and there that we’ve been working out with Sendou since then. One of the most outwardly-noticeable effects of this change was the change in tiebreaker format. Low Ink uses a Swiss bracket for day 1, and the tiebreaker method used with it has a big effect on who makes what Day 2 brackets. Here, I want to analyze the effects of these changes, by recalculating the various tiebreaker methods used, and applying them to the standings for Low Ink December (our first Sendou month) and November (our final Battlefy month).

Results

The simplest way to show impact of the different tiebreaker formats is to order each tournament according to the other tournament’s tiebreakers, to see what would be different if each had been run on the other platform. So I’m going to put a bunch of tables in here. If you want to read my explanation of the differences, jump to the section Tiebreaker formats: the gory details.

First, if November Low Ink had been run on Sendou with its tiebreaker format:

Low Ink November Group A, with Sendou tiebreakers. The Rank variable is the original rank on Battlefy.
# Rank Team W L TB W_g L_g Buch Buch_g Dropped
1 1 Mulch Enjoyers 5 1 0 14 4 22 60 FALSE
2 2 DISCLAIMER* 5 1 0 13 5 22 64 FALSE
3 4 Jet Lag! 5 1 -1 14 4 21 59 FALSE
4 3 UCSD Esports Gold 5 1 -1 13 5 21 61 FALSE
5 5 Celestial 5 1 -1 13 5 20 57 FALSE
6 8 Tidal Disruption 4 2 0 13 5 21 60 FALSE
7 11 THIS IS FINE 4 2 0 13 5 18 52 FALSE
8 6 Remix 4 2 0 11 7 26 72 FALSE
9 12 Dkreef 4 2 0 11 7 19 59 FALSE
10 14 Inkception 4 2 0 11 7 15 46 FALSE
11 10 Moonshine 4 2 -1 13 5 20 53 FALSE
12 13 Microwave Mode 4 2 -1 12 6 18 53 FALSE
13 7 Oracle 4 2 -1 11 7 22 64 FALSE
14 9 FUIT GUMMY (Rebirth) 4 2 -2 14 4 20 56 FALSE
15 17 XNAME 3 3 0 11 7 21 58 FALSE
16 15 Aurora 3 3 0 10 8 23 67 FALSE
17 25 Brigada Woomy 3 3 0 9 9 15 45 FALSE
18 19 Butterfly Effect 3 3 0 8 10 19 60 FALSE
19 21 EWfish 3 3 0 8 10 17 54 FALSE
20 16 Tidal Tempest 3 3 0 7 11 22 63 FALSE
21 18 Guess What!? 3 3 0 6 12 21 71 FALSE
22 20 Krill Switch 3 3 -1 9 9 19 58 FALSE
23 26 RIT Orange Kois 3 3 -2 12 6 13 37 FALSE
24 24 Sea Flat 3 3 -2 10 8 15 44 FALSE
25 22 Killer Blues 3 3 -2 9 9 17 51 FALSE
26 23 Auraboros 3 3 -2 8 10 16 51 FALSE
27 28 IGNITE 2 4 0 8 10 19 58 FALSE
28 34 Nah I’d Ink 2 4 0 8 10 14 39 FALSE
29 27 Sour Patch Squids 2 4 0 6 12 20 59 FALSE
30 29 Trident 2 4 0 4 14 18 56 FALSE
31 31 Little Timmy and the Splats 2 4 -1 8 10 16 52 FALSE
32 33 ⋆ 𖥨. ݁Sea Angels ꒱ ֺЮৎ 2 4 -1 7 10 13 46 FALSE
33 30 Kiwi Ice Cream 2 4 -1 5 13 17 53 FALSE
34 32 Soliloquy of the Stars 2 4 -1 5 13 16 57 FALSE
35 35 octo rehab 2 4 -2 7 11 13 33 FALSE
36 38 RIT Black Seadragons 1 5 0 5 13 12 32 FALSE
37 36 Moray Eels 1 5 0 2 15 15 49 FALSE
38 37 UMassD Goop Squad 1 5 -2 2 15 12 40 FALSE
39 40 Powdermelon Penguins 1 5 0 5 13 14 42 TRUE
40 39 Malevolent Cooler 1 2 0 2 7 11 34 TRUE
Low Ink November Group B, with Sendou tiebreakers. The Rank variable is the original rank on Battlefy.
# Rank Team W L TB W_g L_g Buch Buch_g Dropped
1 1 Tidy Tidings 6 0 0 15 3 20 59 FALSE
2 2 Inkbound 5 1 0 14 4 24 66 FALSE
3 3 REWIND!!! 5 1 0 13 5 21 62 FALSE
4 4 Black Squid Ink Burger 5 1 -1 13 5 20 62 FALSE
5 6 Reload 4 2 0 13 5 23 65 FALSE
6 10 I love latinas Sonic 4 2 0 13 5 20 61 FALSE
7 5 Sonora Inkamita de Iztapalapa 4 2 0 11 7 24 65 FALSE
8 7 Fearless 4 2 -1 12 6 22 64 FALSE
9 12 Asphyxiation 4 2 -1 12 6 19 57 FALSE
10 14 Angelink 4 2 -1 12 6 16 51 FALSE
11 8 Los Inklings and Friends 4 2 -1 11 7 22 60 FALSE
12 9 Malicious Misery 4 2 -1 11 7 21 61 FALSE
13 11 FreakForce 4 2 -1 8 10 20 62 FALSE
14 13 Ghistra 4 2 -2 11 7 18 58 FALSE
15 16 Starfish 3 3 0 10 8 22 62 FALSE
16 17 Determination 3 3 0 10 8 21 64 FALSE
17 15 Squid Rollups 3 3 0 9 9 23 69 FALSE
18 25 Haihappen 3 3 0 9 9 13 42 FALSE
19 23 KANA ON TOP!!!! 3 3 0 8 10 16 51 FALSE
20 21 Coconut Crabs 3 3 -1 11 7 14 42 FALSE
21 24 All I Want for Christmas 3 3 -1 11 7 14 39 FALSE
22 18 YT->MP4 3 3 -1 10 8 19 55 FALSE
23 20 Turbo Torben 3 3 -1 10 8 17 46 FALSE
24 19 Kurain’s Grasp 3 3 -1 9 9 17 47 FALSE
25 26 Astral Ink 3 3 -1 8 10 12 35 FALSE
26 22 Red Velvet Sea 3 3 -2 10 8 16 46 FALSE
27 27 Arrows of Gale 2 4 0 7 11 21 62 FALSE
28 28 Clownfish 2 4 0 7 11 20 58 FALSE
29 29 HP OfficeJet Pro 6962 All-In-One Printer 2 4 0 5 13 19 62 FALSE
30 31 Octaves 2 4 0 5 13 16 54 FALSE
31 32 Cardinal Cephalopods 2 4 0 4 14 16 50 FALSE
32 30 No Crosswords 2 4 -1 7 11 16 52 FALSE
33 34 Death by 100 papercuts 2 4 -2 7 11 14 43 FALSE
34 35 Reignfall 2 4 -2 7 11 14 43 FALSE
35 36 Squidlife Crisis 2 4 -2 7 11 10 30 FALSE
36 33 AD MARE 🪼 2 4 -2 6 12 15 44 FALSE
37 38 Six-Pack Yokes 1 5 0 4 14 13 46 FALSE
38 37 Washed Inktrails 1 5 -1 5 12 13 40 FALSE
39 39 Octowusses 0 6 0 0 18 20 61 FALSE
40 40 R1PT1D3 1 4 0 4 11 17 49 TRUE

Next, if December Low Ink had been run on Battlefy with its tiebreaker format:

Low Ink December Group A, with Battlefy tiebreakers. The Rank variable is the original rank on Sendou.
# Rank Team W L OW% W_g% OOW% Dropped
1 1 Serves up 6 0 58.333 83.333 56.640 FALSE
2 2 FlipSide 5 1 72.222 77.778 55.307 FALSE
3 3 Ball up top 5 1 61.111 88.889 52.526 FALSE
4 4 Star Allies 5 1 61.111 72.222 56.015 FALSE
5 5 Exam Week 4 2 66.667 61.111 53.167 FALSE
6 7 Squid Rollups 4 2 63.833 55.556 54.304 FALSE
7 12 Event Horizon 4 2 61.111 55.556 52.940 FALSE
8 8 idiots in chat 4 2 55.556 72.222 53.795 FALSE
9 9 I love Latinas Sonic 4 2 55.500 61.111 59.406 FALSE
10 13 Calamity Forge Neo 4 2 55.500 55.556 51.361 FALSE
11 6 FREAKFORCE 4 2 52.722 61.111 53.594 FALSE
12 11 Los Inklings 4 2 49.944 61.111 55.043 FALSE
13 10 Malicious Misery 4 2 47.167 61.111 51.827 FALSE
14 14 Fearless 3 3 63.889 55.556 55.528 FALSE
15 15 Tidal Tempest 3 3 58.333 55.556 52.985 FALSE
16 19 11 ft 8 Bridge 3 3 55.556 61.111 50.653 FALSE
17 21 EWfish 3 3 55.556 50.000 55.307 FALSE
18 18 Octaves 3 3 55.500 44.444 57.307 FALSE
19 16 Tocinitos de cielo 3 3 52.722 50.000 53.580 FALSE
20 22 Gubi Fortnite?! 3 3 47.222 50.000 49.117 FALSE
21 20 squid squad 3 3 47.167 61.111 48.507 FALSE
22 24 XName 3 3 47.111 55.556 57.905 FALSE
23 17 Six-Pack Yokes 3 3 44.389 50.000 49.939 FALSE
24 25 Inkbound 3 3 41.667 50.000 51.470 FALSE
25 23 Milkyway 3 3 41.611 44.444 51.470 FALSE
26 30 Reignfall 2 4 58.278 33.333 49.944 FALSE
27 26 TAMU Maroon 2 4 55.556 50.000 56.372 FALSE
28 29 Zero Magnum 2 4 55.556 33.333 51.391 FALSE
29 27 Bad at Math 2 4 44.389 38.889 50.825 FALSE
30 32 octo rehab 2 4 44.333 38.889 50.860 FALSE
31 28 Paladins 2 4 44.333 38.889 48.556 FALSE
32 31 Cod in 4K 2 4 41.611 33.333 54.730 FALSE
33 33 The British Empire 1 5 47.167 22.222 49.307 FALSE
34 34 Job Application 0 6 38.833 0.000 52.736 FALSE
35 35 Moonshine 3 3 49.944 61.111 57.127 TRUE
36 36 [Name Pending] 2 4 46.667 27.778 52.704 TRUE
37 37 Nah I’d ink 1 3 58.333 33.333 53.667 TRUE
38 38 morning struggles 1 4 45.833 20.000 48.193 TRUE
39 39 Moonslice 0 2 58.333 16.667 53.000 TRUE
40 40 Overdive 0 1 50.000 0.000 49.944 TRUE
Low Ink December Group B, with Battlefy tiebreakers. The Rank variable is the original rank on Sendou.
# Rank Team W L OW% W_g% OOW% Dropped
1 1 Jet Lag! 6 0 62.778 83.333 56.788 FALSE
2 5 THIS IS FINE 5 1 58.333 72.222 57.181 FALSE
3 4 Inkception 5 1 58.278 66.667 56.317 FALSE
4 2 Remix 5 1 56.667 72.222 56.884 FALSE
5 3 Ghistra 5 1 55.556 72.222 54.111 FALSE
6 9 Celestial 4 2 66.611 61.111 53.000 FALSE
7 10 Distortion 4 2 63.889 77.778 55.242 FALSE
8 6 Caniac Central 4 2 63.778 72.222 56.983 FALSE
9 7 Inferno 4 2 60.000 66.667 54.000 FALSE
10 13 Starfish 4 2 52.722 55.556 55.206 FALSE
11 12 Crisis Averted 4 2 52.722 55.556 52.985 FALSE
12 11 Pixel per ink 4 2 44.444 61.111 51.696 FALSE
13 8 Red Velvet Sea 4 2 43.278 66.667 54.133 FALSE
14 15 Twister Time 3 3 65.000 50.000 53.580 FALSE
15 16 o7 Honor Bound 3 3 61.111 50.000 55.739 FALSE
16 17 seasick 3 3 61.000 50.000 54.730 FALSE
17 18 Error 404 3 3 54.389 55.556 55.754 FALSE
18 20 Sheer Cold 3 3 52.667 44.444 55.727 FALSE
19 21 Cloud Strikers! 3 3 49.944 38.889 46.042 FALSE
20 19 Brigada Woomy 3 3 47.111 55.556 52.016 FALSE
21 14 Krill Streak 3 3 45.500 66.667 50.730 FALSE
22 22 Loud Noises 3 3 44.444 55.556 55.492 FALSE
23 23 Tide Breaker 2 4 52.722 44.444 49.942 FALSE
24 25 Deep Sea International 2 4 52.667 38.889 48.058 FALSE
25 26 Pringle Cat 2 4 49.889 38.889 54.806 FALSE
26 28 Cooler High 2 4 45.500 38.889 50.825 FALSE
27 27 Baguette squad 2 4 38.722 33.333 46.286 FALSE
28 24 The Slam Blitzers 2 4 38.667 44.444 50.712 FALSE
29 29 C-50 1 5 36.056 11.111 48.058 FALSE
30 30 Starstrikerz 0 6 33.222 5.556 43.391 FALSE
31 31 Tidy Tidings 3 2 73.333 66.667 57.877 TRUE
32 32 REWIND!!! 3 3 55.556 38.889 51.152 TRUE
33 39 Cosmic Hunters 1 1 41.667 25.000 45.111 TRUE
34 33 Asphyxiation 2 3 60.000 53.333 55.800 TRUE
35 34 Chirpy Chips Crusaders 2 4 52.722 38.889 47.238 TRUE
36 37 Fish Paste 1 3 58.333 25.000 48.778 TRUE
37 38 SplatSCAD-Orange 1 3 50.000 25.000 56.600 TRUE
38 35 Squid’s Rendezvous 1 5 50.000 27.778 49.931 TRUE
39 36 TAMU White 1 5 41.611 22.222 48.464 TRUE
40 40 Citronnade 0 2 66.667 0.000 59.933 TRUE

We can also show the differences between methods graphically, by plotting new ranking vs. old ranking for each group. I’ve used dashed lines to highlight the 1:1 correspondence line as well as the Alpha/Beta/Gamma bracket cutoffs in each plot.

Lots of changes! One notable result is that the last two teams in Alpha bracket for December Group B fall to 9th and 13th, and two teams from Beta move up to Alpha.

Tiebreaker formats: the gory details

Since many teams will have the same record in a six-round Swiss bracket, tiebreakers need to be employed to figure out the standings. On Battlefy, we had been using their default tiebreaker method, which was somewhat badly documented, but was basically:1

  • each team’s opponents’ set2 winning percentage (OW%)
  • each team’s game winning percentage (W_g%)
  • each team’s opponents’ opponents’ set winning percentage (OOW%).

Sendou’s tiebreaker method, meanwhile, was:

  • each team’s number of losses to tied opponents (-1 “points” per loss, so fewer is better) (TB)
  • each team’s number of game wins (W_g)
  • each team’s (setwise) Buchholz score (Buch)
  • each team’s (gamewise) Buchholz score (Buch_g).

A team’s Buchholz score is the sum of the wins of all of that team’s opponents (set wins or game wins, depending). This is similar, but not identical, to OW%. Both are attempting to measure “strength of schedule,” or how good your opponents were. Immediately, we can see two big differences:

  1. Strength of schedule comes before game win rate on Battlefy, but is after game wins on Sendou.
  2. Sendou has a tiebreaker that doesn’t exist on Battlefy (losses to tied opponents), that applies first.

We historically have liked that OW% is before W_g% on Battlefy, because they’re sort of inversely correlated: if you have tougher opponents, you’re less likely to win 3-0 against them. We don’t want to reward a team more for stomping a weak team than for winning a close set against a strong team, and the Battlefy tiebreaker order did a decent job of accounting for that. On Sendou, these are effectively flipped (W_g is before Buch), which has the opposite effect.

The impact of the losses-to-tied-opponents (TB) column is more complicated. The motivating rationale for this is “you shouldn’t finish above a team who beat you, if you both end up with the same record.” This is sensible, and isn’t accounted for at all by Battlefy’s method.3 However, it doesn’t actually ensure that a team will finish ahead of a tied team that they beat head-to-head.4

What about drops? What about byes?

Drops cause lots of problems for Swiss brackets, and the tiebreaker methods are not immune to this. You may have noticed that Battlefy’s methods all depend on rates instead of raw counts like Sendou’s methods, and I suspect that this is why. If everybody completes the bracket and if the bracket has an even number of participants, OW% and Buchholz are equivalent (you can convert by dividing Buchholz by the total number of sets played by a team’s opponents). But if someone drops, or if byes have to be used, they diverge (drops, of course, are also frequently why byes end up being required). Battlefy’s OW% method is “generous,” in that it effectively assumes that all teams’ winning percentages are accurate, even if they didn’t play very many sets. A 1-0 drop will count the same as a 6-0 team, 1-1 will be equivalent to 3-3, etc. This, combined with the floor they use (see footnote 1), means that their OW% tends to be higher than what you’d get by, say, calculating it as “total opponent wins / total opponent sets played”, though it isn’t always higher. It is always higher than the “effective OW%” of Buchholz, though. Because Buchholz isn’t normalized by number of sets played, if your opponents played fewer sets than normal (because you had a bye and thus had fewer opponents, or some of them had byes, or some of them dropped, etc.), you “lose out” on chances to get Buchholz points.5

The TB column is also sensitive to the issue of teams dropping early. If you lose to a team that doesn’t finish, they can’t end up with the same record as you, so that loss won’t count against you for this tiebreaker. This means that you avoid getting dinged for that loss solely because the other team dropped out early, even if they could have tied with you if they had completed the bracket!

Byes have to be considered on their own, as well. Battlefy internally scores byes as a 2-0 victory over a null team; Sendou scores them as 3-0, which makes them just as good as sweeping a real team, and a lot better than beating a real team 2-1. By making them “2-0” victories, Battlefy makes them worse than a sweep of a real team, and only a little bit “better” than a 2-1 victory (and because OW% comes before W_g% in their method, they’re probably overall worse than a 2-1 win over a real team that can contribute to your OW%). Most Swiss bracket setups I’ve seen count byes as “less than a win” in various ways, on the principle that real wins should outweigh “fake” ones.

Hypothetical New Formats

The Low Ink TOs did already discuss the tiebreakers earlier this week, and we agreed that at the very least, moving Buch before W_g is a good idea. Also, we could change Buch to be closer to OW%, to account for drops more elegantly. My R script has the ability to sort the standings however I want, and I am tempted to print even more tables for these alternate methods, but this write-up is already too long as it is. Maybe another time…

Footnotes

  1. My naive idea for how to calculate OW% was “total opponent set wins / total opponent sets played”, but that’s not what they do. Instead, they take each opponent’s (set) win rates, change any that are below 0.33 to 0.33 (exactly 0.33, not 1/3), and average those win rates together. The “floor” is intended to lower the impact of playing a particularly low-scoring team, I guess. OOW% is calculated the same way, but for all of the opponents of a given team’s opponents.↩︎

  2. Nobody uses consistent nomenclature for rounds/matches/games/sets or what have you. I’m going to say that a set is a set of 3 games played against a single opponent in a given round of the bracket. So your set wins, which is the first thing that determines standings, is how many sets you won. Your game wins is how many individual games you won across all of your sets. Your set/game winning percentages are your set/game wins divided by the total number of sets/games you played.↩︎

  3. In Group A in November, FUIT GUMMY (Rebirth) finished above both teams they lost to, THIS IS FINE and Dkreef, despite all three teams finishing 4-2. This gave them the top seed in the Beta bracket. The other two teams lost one set each to a 5-1 and a 3-3 opponent. In a vacuum, I’m not sure if “two losses to 4-2 teams” or “a loss to a 5-1 team and a loss to a 3-3 team” should be preferred, though.↩︎

  4. In the November Group A bracket, UCSD Esports Gold defeated Jet Lag!, and both finished with a 5-1 record. Battlefy’s tiebreakers put UCSD Esports Gold at 3rd and Jet Lag! at 4th, but Sendou’s tiebreakers would have swapped them, even though UCSD beat Jet Lag. It should be possible to program a tiebreaker that accomplishes the goal of this one, but it’d involve a lot of pairwise comparisons, and I’m not sure how exactly to display it in a table on the website.↩︎

  5. If it helps, you can “convert” Buchholz to “effective OW%” by dividing the total opponent set wins by the maximum number of possible opponent sets played (36, in our case), no matter how many they actually did play.↩︎