Tiebreaker format comparisons: Battlefy vs. Sendou

Author

glumbaron

Published

December 19, 2024

Introduction

In December 2024, IPL switched from Battlefy to sendou.ink for Low Ink, our large skill-capped monthly tournament. This largely went well, with some wrinkles here and there that we’ve been working out with Sendou since then. One of the most outwardly-noticeable effects of this change was the change in tiebreaker format. Low Ink uses a Swiss bracket for day 1, and the tiebreaker method used with it has a big effect on who makes what Day 2 brackets. Here, I want to analyze the effects of these changes, by recalculating the various tiebreaker methods used, and applying them to the standings for Low Ink December (our first Sendou month) and November (our final Battlefy month).

Results

The simplest way to show impact of the different tiebreaker formats is to order each tournament according to the other tournament’s tiebreakers, to see what would be different if each had been run on the other platform. So I’m going to put a bunch of tables in here. If you want to read my explanation of the differences, jump to the section Tiebreaker formats: the gory details.

First, if November Low Ink had been run on Sendou with its tiebreaker format:

Low Ink November Group A, with Sendou tiebreakers. The `Rank` variable is the original rank on Battlefy.
#	Rank	Team	W	L	TB	W_g	L_g	Buch	Buch_g	Dropped
1	1	Mulch Enjoyers	5	1	0	14	4	22	60	FALSE
2	2	DISCLAIMER*	5	1	0	13	5	22	64	FALSE
3	4	Jet Lag!	5	1	-1	14	4	21	59	FALSE
4	3	UCSD Esports Gold	5	1	-1	13	5	21	61	FALSE
5	5	Celestial	5	1	-1	13	5	20	57	FALSE
6	8	Tidal Disruption	4	2	0	13	5	21	60	FALSE
7	11	THIS IS FINE	4	2	0	13	5	18	52	FALSE
8	6	Remix	4	2	0	11	7	26	72	FALSE
9	12	Dkreef	4	2	0	11	7	19	59	FALSE
10	14	Inkception	4	2	0	11	7	15	46	FALSE
11	10	Moonshine	4	2	-1	13	5	20	53	FALSE
12	13	Microwave Mode	4	2	-1	12	6	18	53	FALSE
13	7	Oracle	4	2	-1	11	7	22	64	FALSE
14	9	FUIT GUMMY (Rebirth)	4	2	-2	14	4	20	56	FALSE
15	17	XNAME	3	3	0	11	7	21	58	FALSE
16	15	Aurora	3	3	0	10	8	23	67	FALSE
17	25	Brigada Woomy	3	3	0	9	9	15	45	FALSE
18	19	Butterfly Effect	3	3	0	8	10	19	60	FALSE
19	21	EWfish	3	3	0	8	10	17	54	FALSE
20	16	Tidal Tempest	3	3	0	7	11	22	63	FALSE
21	18	Guess What!?	3	3	0	6	12	21	71	FALSE
22	20	Krill Switch	3	3	-1	9	9	19	58	FALSE
23	26	RIT Orange Kois	3	3	-2	12	6	13	37	FALSE
24	24	Sea Flat	3	3	-2	10	8	15	44	FALSE
25	22	Killer Blues	3	3	-2	9	9	17	51	FALSE
26	23	Auraboros	3	3	-2	8	10	16	51	FALSE
27	28	IGNITE	2	4	0	8	10	19	58	FALSE
28	34	Nah I’d Ink	2	4	0	8	10	14	39	FALSE
29	27	Sour Patch Squids	2	4	0	6	12	20	59	FALSE
30	29	Trident	2	4	0	4	14	18	56	FALSE
31	31	Little Timmy and the Splats	2	4	-1	8	10	16	52	FALSE
32	33	⋆ 𖥨. ݁Sea Angels ꒱ ֺЮৎ	2	4	-1	7	10	13	46	FALSE
33	30	Kiwi Ice Cream	2	4	-1	5	13	17	53	FALSE
34	32	Soliloquy of the Stars	2	4	-1	5	13	16	57	FALSE
35	35	octo rehab	2	4	-2	7	11	13	33	FALSE
36	38	RIT Black Seadragons	1	5	0	5	13	12	32	FALSE
37	36	Moray Eels	1	5	0	2	15	15	49	FALSE
38	37	UMassD Goop Squad	1	5	-2	2	15	12	40	FALSE
39	40	Powdermelon Penguins	1	5	0	5	13	14	42	TRUE
40	39	Malevolent Cooler	1	2	0	2	7	11	34	TRUE

Low Ink November Group B, with Sendou tiebreakers. The `Rank` variable is the original rank on Battlefy.
#	Rank	Team	W	L	TB	W_g	L_g	Buch	Buch_g	Dropped
1	1	Tidy Tidings	6	0	0	15	3	20	59	FALSE
2	2	Inkbound	5	1	0	14	4	24	66	FALSE
3	3	REWIND!!!	5	1	0	13	5	21	62	FALSE
4	4	Black Squid Ink Burger	5	1	-1	13	5	20	62	FALSE
5	6	Reload	4	2	0	13	5	23	65	FALSE
6	10	I love latinas Sonic	4	2	0	13	5	20	61	FALSE
7	5	Sonora Inkamita de Iztapalapa	4	2	0	11	7	24	65	FALSE
8	7	Fearless	4	2	-1	12	6	22	64	FALSE
9	12	Asphyxiation	4	2	-1	12	6	19	57	FALSE
10	14	Angelink	4	2	-1	12	6	16	51	FALSE
11	8	Los Inklings and Friends	4	2	-1	11	7	22	60	FALSE
12	9	Malicious Misery	4	2	-1	11	7	21	61	FALSE
13	11	FreakForce	4	2	-1	8	10	20	62	FALSE
14	13	Ghistra	4	2	-2	11	7	18	58	FALSE
15	16	Starfish	3	3	0	10	8	22	62	FALSE
16	17	Determination	3	3	0	10	8	21	64	FALSE
17	15	Squid Rollups	3	3	0	9	9	23	69	FALSE
18	25	Haihappen	3	3	0	9	9	13	42	FALSE
19	23	KANA ON TOP!!!!	3	3	0	8	10	16	51	FALSE
20	21	Coconut Crabs	3	3	-1	11	7	14	42	FALSE
21	24	All I Want for Christmas	3	3	-1	11	7	14	39	FALSE
22	18	YT->MP4	3	3	-1	10	8	19	55	FALSE
23	20	Turbo Torben	3	3	-1	10	8	17	46	FALSE
24	19	Kurain’s Grasp	3	3	-1	9	9	17	47	FALSE
25	26	Astral Ink	3	3	-1	8	10	12	35	FALSE
26	22	Red Velvet Sea	3	3	-2	10	8	16	46	FALSE
27	27	Arrows of Gale	2	4	0	7	11	21	62	FALSE
28	28	Clownfish	2	4	0	7	11	20	58	FALSE
29	29	HP OfficeJet Pro 6962 All-In-One Printer	2	4	0	5	13	19	62	FALSE
30	31	Octaves	2	4	0	5	13	16	54	FALSE
31	32	Cardinal Cephalopods	2	4	0	4	14	16	50	FALSE
32	30	No Crosswords	2	4	-1	7	11	16	52	FALSE
33	34	Death by 100 papercuts	2	4	-2	7	11	14	43	FALSE
34	35	Reignfall	2	4	-2	7	11	14	43	FALSE
35	36	Squidlife Crisis	2	4	-2	7	11	10	30	FALSE
36	33	AD MARE 🪼	2	4	-2	6	12	15	44	FALSE
37	38	Six-Pack Yokes	1	5	0	4	14	13	46	FALSE
38	37	Washed Inktrails	1	5	-1	5	12	13	40	FALSE
39	39	Octowusses	0	6	0	0	18	20	61	FALSE
40	40	R1PT1D3	1	4	0	4	11	17	49	TRUE

Next, if December Low Ink had been run on Battlefy with its tiebreaker format:

Low Ink December Group A, with Battlefy tiebreakers. The `Rank` variable is the original rank on Sendou.
#	Rank	Team	W	L	OW%	W_g%	OOW%	Dropped
1	1	Serves up	6	0	58.333	83.333	56.640	FALSE
2	2	FlipSide	5	1	72.222	77.778	55.307	FALSE
3	3	Ball up top	5	1	61.111	88.889	52.526	FALSE
4	4	Star Allies	5	1	61.111	72.222	56.015	FALSE
5	5	Exam Week	4	2	66.667	61.111	53.167	FALSE
6	7	Squid Rollups	4	2	63.833	55.556	54.304	FALSE
7	12	Event Horizon	4	2	61.111	55.556	52.940	FALSE
8	8	idiots in chat	4	2	55.556	72.222	53.795	FALSE
9	9	I love Latinas Sonic	4	2	55.500	61.111	59.406	FALSE
10	13	Calamity Forge Neo	4	2	55.500	55.556	51.361	FALSE
11	6	FREAKFORCE	4	2	52.722	61.111	53.594	FALSE
12	11	Los Inklings	4	2	49.944	61.111	55.043	FALSE
13	10	Malicious Misery	4	2	47.167	61.111	51.827	FALSE
14	14	Fearless	3	3	63.889	55.556	55.528	FALSE
15	15	Tidal Tempest	3	3	58.333	55.556	52.985	FALSE
16	19	11 ft 8 Bridge	3	3	55.556	61.111	50.653	FALSE
17	21	EWfish	3	3	55.556	50.000	55.307	FALSE
18	18	Octaves	3	3	55.500	44.444	57.307	FALSE
19	16	Tocinitos de cielo	3	3	52.722	50.000	53.580	FALSE
20	22	Gubi Fortnite?!	3	3	47.222	50.000	49.117	FALSE
21	20	squid squad	3	3	47.167	61.111	48.507	FALSE
22	24	XName	3	3	47.111	55.556	57.905	FALSE
23	17	Six-Pack Yokes	3	3	44.389	50.000	49.939	FALSE
24	25	Inkbound	3	3	41.667	50.000	51.470	FALSE
25	23	Milkyway	3	3	41.611	44.444	51.470	FALSE
26	30	Reignfall	2	4	58.278	33.333	49.944	FALSE
27	26	TAMU Maroon	2	4	55.556	50.000	56.372	FALSE
28	29	Zero Magnum	2	4	55.556	33.333	51.391	FALSE
29	27	Bad at Math	2	4	44.389	38.889	50.825	FALSE
30	32	octo rehab	2	4	44.333	38.889	50.860	FALSE
31	28	Paladins	2	4	44.333	38.889	48.556	FALSE
32	31	Cod in 4K	2	4	41.611	33.333	54.730	FALSE
33	33	The British Empire	1	5	47.167	22.222	49.307	FALSE
34	34	Job Application	0	6	38.833	0.000	52.736	FALSE
35	35	Moonshine	3	3	49.944	61.111	57.127	TRUE
36	36	[Name Pending]	2	4	46.667	27.778	52.704	TRUE
37	37	Nah I’d ink	1	3	58.333	33.333	53.667	TRUE
38	38	morning struggles	1	4	45.833	20.000	48.193	TRUE
39	39	Moonslice	0	2	58.333	16.667	53.000	TRUE
40	40	Overdive	0	1	50.000	0.000	49.944	TRUE

Low Ink December Group B, with Battlefy tiebreakers. The `Rank` variable is the original rank on Sendou.
#	Rank	Team	W	L	OW%	W_g%	OOW%	Dropped
1	1	Jet Lag!	6	0	62.778	83.333	56.788	FALSE
2	5	THIS IS FINE	5	1	58.333	72.222	57.181	FALSE
3	4	Inkception	5	1	58.278	66.667	56.317	FALSE
4	2	Remix	5	1	56.667	72.222	56.884	FALSE
5	3	Ghistra	5	1	55.556	72.222	54.111	FALSE
6	9	Celestial	4	2	66.611	61.111	53.000	FALSE
7	10	Distortion	4	2	63.889	77.778	55.242	FALSE
8	6	Caniac Central	4	2	63.778	72.222	56.983	FALSE
9	7	Inferno	4	2	60.000	66.667	54.000	FALSE
10	13	Starfish	4	2	52.722	55.556	55.206	FALSE
11	12	Crisis Averted	4	2	52.722	55.556	52.985	FALSE
12	11	Pixel per ink	4	2	44.444	61.111	51.696	FALSE
13	8	Red Velvet Sea	4	2	43.278	66.667	54.133	FALSE
14	15	Twister Time	3	3	65.000	50.000	53.580	FALSE
15	16	o7 Honor Bound	3	3	61.111	50.000	55.739	FALSE
16	17	seasick	3	3	61.000	50.000	54.730	FALSE
17	18	Error 404	3	3	54.389	55.556	55.754	FALSE
18	20	Sheer Cold	3	3	52.667	44.444	55.727	FALSE
19	21	Cloud Strikers!	3	3	49.944	38.889	46.042	FALSE
20	19	Brigada Woomy	3	3	47.111	55.556	52.016	FALSE
21	14	Krill Streak	3	3	45.500	66.667	50.730	FALSE
22	22	Loud Noises	3	3	44.444	55.556	55.492	FALSE
23	23	Tide Breaker	2	4	52.722	44.444	49.942	FALSE
24	25	Deep Sea International	2	4	52.667	38.889	48.058	FALSE
25	26	Pringle Cat	2	4	49.889	38.889	54.806	FALSE
26	28	Cooler High	2	4	45.500	38.889	50.825	FALSE
27	27	Baguette squad	2	4	38.722	33.333	46.286	FALSE
28	24	The Slam Blitzers	2	4	38.667	44.444	50.712	FALSE
29	29	C-50	1	5	36.056	11.111	48.058	FALSE
30	30	Starstrikerz	0	6	33.222	5.556	43.391	FALSE
31	31	Tidy Tidings	3	2	73.333	66.667	57.877	TRUE
32	32	REWIND!!!	3	3	55.556	38.889	51.152	TRUE
33	39	Cosmic Hunters	1	1	41.667	25.000	45.111	TRUE
34	33	Asphyxiation	2	3	60.000	53.333	55.800	TRUE
35	34	Chirpy Chips Crusaders	2	4	52.722	38.889	47.238	TRUE
36	37	Fish Paste	1	3	58.333	25.000	48.778	TRUE
37	38	SplatSCAD-Orange	1	3	50.000	25.000	56.600	TRUE
38	35	Squid’s Rendezvous	1	5	50.000	27.778	49.931	TRUE
39	36	TAMU White	1	5	41.611	22.222	48.464	TRUE
40	40	Citronnade	0	2	66.667	0.000	59.933	TRUE

We can also show the differences between methods graphically, by plotting new ranking vs. old ranking for each group. I’ve used dashed lines to highlight the 1:1 correspondence line as well as the Alpha/Beta/Gamma bracket cutoffs in each plot.

Lots of changes! One notable result is that the last two teams in Alpha bracket for December Group B fall to 9th and 13th, and two teams from Beta move up to Alpha.

Tiebreaker formats: the gory details

Since many teams will have the same record in a six-round Swiss bracket, tiebreakers need to be employed to figure out the standings. On Battlefy, we had been using their default tiebreaker method, which was somewhat badly documented, but was basically:¹

each team’s opponents’ set² winning percentage (OW%)
each team’s game winning percentage (W_g%)
each team’s opponents’ opponents’ set winning percentage (OOW%).

Sendou’s tiebreaker method, meanwhile, was:

each team’s number of losses to tied opponents (-1 “points” per loss, so fewer is better) (TB)
each team’s number of game wins (W_g)
each team’s (setwise) Buchholz score (Buch)
each team’s (gamewise) Buchholz score (Buch_g).

A team’s Buchholz score is the sum of the wins of all of that team’s opponents (set wins or game wins, depending). This is similar, but not identical, to OW%. Both are attempting to measure “strength of schedule,” or how good your opponents were. Immediately, we can see two big differences:

Strength of schedule comes before game win rate on Battlefy, but is after game wins on Sendou.
Sendou has a tiebreaker that doesn’t exist on Battlefy (losses to tied opponents), that applies first.

We historically have liked that OW% is before W_g% on Battlefy, because they’re sort of inversely correlated: if you have tougher opponents, you’re less likely to win 3-0 against them. We don’t want to reward a team more for stomping a weak team than for winning a close set against a strong team, and the Battlefy tiebreaker order did a decent job of accounting for that. On Sendou, these are effectively flipped (W_g is before Buch), which has the opposite effect.

The impact of the losses-to-tied-opponents (TB) column is more complicated. The motivating rationale for this is “you shouldn’t finish above a team who beat you, if you both end up with the same record.” This is sensible, and isn’t accounted for at all by Battlefy’s method.³ However, it doesn’t actually ensure that a team will finish ahead of a tied team that they beat head-to-head.⁴

What about drops? What about byes?

Drops cause lots of problems for Swiss brackets, and the tiebreaker methods are not immune to this. You may have noticed that Battlefy’s methods all depend on rates instead of raw counts like Sendou’s methods, and I suspect that this is why. If everybody completes the bracket and if the bracket has an even number of participants, OW% and Buchholz are equivalent (you can convert by dividing Buchholz by the total number of sets played by a team’s opponents). But if someone drops, or if byes have to be used, they diverge (drops, of course, are also frequently why byes end up being required). Battlefy’s OW% method is “generous,” in that it effectively assumes that all teams’ winning percentages are accurate, even if they didn’t play very many sets. A 1-0 drop will count the same as a 6-0 team, 1-1 will be equivalent to 3-3, etc. This, combined with the floor they use (see footnote 1), means that their OW% tends to be higher than what you’d get by, say, calculating it as “total opponent wins / total opponent sets played”, though it isn’t always higher. It is always higher than the “effective OW%” of Buchholz, though. Because Buchholz isn’t normalized by number of sets played, if your opponents played fewer sets than normal (because you had a bye and thus had fewer opponents, or some of them had byes, or some of them dropped, etc.), you “lose out” on chances to get Buchholz points.⁵

The TB column is also sensitive to the issue of teams dropping early. If you lose to a team that doesn’t finish, they can’t end up with the same record as you, so that loss won’t count against you for this tiebreaker. This means that you avoid getting dinged for that loss solely because the other team dropped out early, even if they could have tied with you if they had completed the bracket!

Byes have to be considered on their own, as well. Battlefy internally scores byes as a 2-0 victory over a null team; Sendou scores them as 3-0, which makes them just as good as sweeping a real team, and a lot better than beating a real team 2-1. By making them “2-0” victories, Battlefy makes them worse than a sweep of a real team, and only a little bit “better” than a 2-1 victory (and because OW% comes before W_g% in their method, they’re probably overall worse than a 2-1 win over a real team that can contribute to your OW%). Most Swiss bracket setups I’ve seen count byes as “less than a win” in various ways, on the principle that real wins should outweigh “fake” ones.

Hypothetical New Formats

The Low Ink TOs did already discuss the tiebreakers earlier this week, and we agreed that at the very least, moving Buch before W_g is a good idea. Also, we could change Buch to be closer to OW%, to account for drops more elegantly. My R script has the ability to sort the standings however I want, and I am tempted to print even more tables for these alternate methods, but this write-up is already too long as it is. Maybe another time…

Footnotes

My naive idea for how to calculate OW% was “total opponent set wins / total opponent sets played”, but that’s not what they do. Instead, they take each opponent’s (set) win rates, change any that are below 0.33 to 0.33 (exactly 0.33, not 1/3), and average those win rates together. The “floor” is intended to lower the impact of playing a particularly low-scoring team, I guess. OOW% is calculated the same way, but for all of the opponents of a given team’s opponents.↩︎
Nobody uses consistent nomenclature for rounds/matches/games/sets or what have you. I’m going to say that a set is a set of 3 games played against a single opponent in a given round of the bracket. So your set wins, which is the first thing that determines standings, is how many sets you won. Your game wins is how many individual games you won across all of your sets. Your set/game winning percentages are your set/game wins divided by the total number of sets/games you played.↩︎
In Group A in November, FUIT GUMMY (Rebirth) finished above both teams they lost to, THIS IS FINE and Dkreef, despite all three teams finishing 4-2. This gave them the top seed in the Beta bracket. The other two teams lost one set each to a 5-1 and a 3-3 opponent. In a vacuum, I’m not sure if “two losses to 4-2 teams” or “a loss to a 5-1 team and a loss to a 3-3 team” should be preferred, though.↩︎
In the November Group A bracket, UCSD Esports Gold defeated Jet Lag!, and both finished with a 5-1 record. Battlefy’s tiebreakers put UCSD Esports Gold at 3rd and Jet Lag! at 4th, but Sendou’s tiebreakers would have swapped them, even though UCSD beat Jet Lag. It should be possible to program a tiebreaker that accomplishes the goal of this one, but it’d involve a lot of pairwise comparisons, and I’m not sure how exactly to display it in a table on the website.↩︎
If it helps, you can “convert” Buchholz to “effective OW%” by dividing the total opponent set wins by the maximum number of possible opponent sets played (36, in our case), no matter how many they actually did play.↩︎