[localhost:~]$ cat * > /dev/ragfield

Tuesday, May 5, 2009

Tri the Illini swim analysis

On Saturday I participated in the Tri the Illini triathlon on the University of Illinois campus. You can read all about the race here. One of the interesting things about this race is that participants were started 10 seconds apart in order of their estimated time for the 300 meter swim in the indoor pool. In theory, if everyone swims at their estimated time nobody will have to pass anyone else in the pool. Now that the results have been posted, let's take a quick look to see how accurate the participants' predictions were.
Import the data from the results web page.
data = Import["http://www.mattoonmultisport.com/images/stories/results/trithetri/overall.htm", {"HTML", "FullData"}];
Clean it up a bit by removing empty elements, labels, and column headers. Basically, we only want the entries with an integer value in the first column (the overall place).
Length[data]
9
data = DeleteCases[data, {}|{{}}];
Length[data]
1
data = First[data];
Length[data]
345
Take[data, 12]//InputForm
{{"", "------- Swim -------", "------- T1 -------",
"------- Bike -------", "------- T2 -------", "------- Run -------",
"Total"}, {"Place", "Name", "Bib No", "Age", "Rnk", "Time", "Pace",
"Rnk", "Time", "Pace", "Rnk", "Time", "Rate", "Rnk", "Time", "Pace",
"Rnk", "Time", "Pace", "Time"}, {1, "Daniel Bretscher", 8, 26, 3,
"04:19.75", "23:59/M", 1, "00:34.00", "", 1, "26:42.95", "24.7mph",
19, "00:44.25", "", 2, "16:09.15", "5:23/M", "48:30.10"},
{2, "Michael Bridenbaug", 27, 25, 15, "04:39.50", "25:50/M", 6,
"00:43.65", "", 5, "28:16.55", "23.3mph", 6, "00:37.55", "", 4,
"16:15.75", "5:25/M", "50:33.00"}, {3, "Peter Garde", 17, 24, 24,
"04:53.05", "27:08/M", 51, "01:21.90", "", 2, "27:06.50", "24.4mph",
109, "01:05.95", "", 3, "16:13.05", "5:24/M", "50:40.45"},
{4, "Nickolaus Early", 16, 29, 2, "04:07.30", "22:52/M", 18,
"00:57.45", "", 4, "27:58.40", "23.6mph", 4, "00:36.40", "", 9,
"18:01.25", "6:00/M", "51:40.80"}, {5, "Zach Rosenbarger", 78, 33, 50,
"05:15.85", "29:10/M", 45, "01:16.75", "", 3, "27:42.00", "23.8mph",
54, "00:53.30", "", 5, "17:11.80", "5:44/M", "52:19.70"},
{6, "Edward Elliot", 32, 28, 11, "04:35.45", "25:28/M", 16, "00:56.15",
"", 6, "28:23.40", "23.3mph", 38, "00:48.85", "", 7, "17:43.75",
"5:54/M", "52:27.60"}, {7, "Ryan Forster", 28, 27, 35, "05:03.95",
"28:03/M", 9, "00:46.20", "", 12, "29:39.95", "22.3mph", 39,
"00:49.05", "", 11, "18:07.00", "6:02/M", "54:26.15"},
{8, "Jun Yamaguchi", 15, 27, 27, "04:58.20", "27:36/M", 2, "00:37.85",
"", 11, "29:36.70", "22.3mph", 30, "00:47.05", "", 13, "18:49.30",
"6:16/M", "54:49.10"}, {9, "Scott Paluska", 63, 42, 71, "05:35.30",
"31:01/M", 3, "00:40.70", "", 7, "28:44.00", "23.0mph", 118,
"01:06.90", "", 12, "18:43.60", "6:14/M", "54:50.50"},
{10, "Rob Raguet-Schoofield", 42, 31, 44, "05:09.75", "28:37/M", 20,
"01:01.55", "", 18, "30:10.50", "21.9mph", 45, "00:51.45", "", 8,
"17:52.60", "5:57/M", "55:05.85"}}
data = DeleteCases[data, x_/;Head[First[x]] === String];
Length[data]
301
places = data[[All, 1]];
places==Range[301]
True
swimSeeds = data[[All, 3]];
swimPlaces = data[[All, 5]];
swimΔ = swimPlaces - swimSeeds;
Take a look at {overall place, swim seed, swim place, swim Δ} for each participant. A negative Δ means the participant's swim place was better than their seeded swim place, while a positive Δ means the participant's swim place was worse than their seeded swim place.
Grid[Prepend[Transpose[{places, swimSeeds, swimPlaces, swimΔ}], {"Overall\nPlace", "Swim\nSeed", "Swim\nPlace", "Swim\nΔ"}], Dividers->All]
Overall
Place
Swim
Seed
Swim
Place
Swim
Δ
183 - 5
22715 - 12
317247
4162 - 14
57850 - 28
63211 - 21
728357
8152712
963718
1042442
11990
126231 - 31
135410551
143014 - 16
15200155 - 45
16148 - 6
1783830
1863327
1947536
206634 - 32
214010262
228610620
2310075 - 25
24455914
2512058 - 62
264128
2733633
283313 - 20
295125 - 26
307211543
3112164
32397 - 32
3365661
34330240 - 90
35486416
36216 - 15
37135 - 8
386856 - 12
3921289 - 123
4021479 - 135
416745 - 22
427621 - 55
4375211136
448163 - 18
45251172 - 79
464411470
4716295 - 67
4874 - 3
49749218
505548 - 7
515229 - 23
52237124 - 113
5311187
54779821
555630 - 26
56117104 - 13
57110
5830074 - 226
59151125 - 26
6013982 - 57
61274151 - 123
626422 - 42
6312287 - 35
6436371
65250243 - 7
66154148 - 6
6712190 - 31
68136258122
69282146 - 136
708769 - 18
71112110 - 2
72315278 - 37
732419 - 5
748077 - 3
7513518449
76699728
77185163 - 22
785949 - 10
7912894 - 34
806051 - 9
8118728 - 159
823520 - 15
8311313421
849040 - 50
8516317 - 146
86223816
8713320067
8826773 - 194
89256145 - 111
909212230
9115920647
92165130 - 35
9313715316
94149128 - 21
9510480 - 24
96182136 - 46
97264620
9817293 - 79
9924155 - 186
100277109 - 168
10111416551
102971036
10313115928
104228194 - 34
1054611771
10614119958
10713216230
108246132 - 114
10911520590
110213183 - 30
11112718154
112157147 - 10
113254131 - 123
114266256 - 10
115416120
1166139 - 22
1174341 - 2
1181911965
119106257151
1209681 - 15
12123460 - 174
12220185 - 116
12310814234
12416121857
12512314926
1269815052
127138271133
128265179 - 86
12919554 - 141
130508636
13153161108
132170137 - 33
1335826 - 32
134169120 - 49
13520424945
13634265 - 277
137911009
13810347 - 56
13912518257
1408268 - 14
1418876 - 12
14214724194
143238107 - 131
144303202 - 101
14570191121
146297247 - 50
147188178 - 10
1482108
14913016434
150311288 - 23
15116818719
152283158 - 125
15314421672
154295287 - 8
15520824436
156287157 - 130
15711117665
158199111 - 88
159272223 - 49
16017924263
16184190106
162310152 - 158
1631091189
164376730
16518022747
166340135 - 205
167328193 - 135
168263197 - 66
169312250 - 62
170148143 - 5
171242224 - 18
172278230 - 48
173253139 - 114
174292279 - 13
175103222
176276254 - 22
1771810183
17815516712
1792712765
18010216967
18119224856
18225778 - 179
18393210117
184346294 - 52
18525526914
18615819234
18749171122
18812620175
189143140 - 3
19023188 - 143
191220186 - 34
192298177 - 121
19310714437
1949412935
19512918859
19618427490
197116273157
19822425935
19918621428
20017419824
201290221 - 69
2029517580
20315221563
20418924657
20515623983
206336141 - 195
20711817456
20819791 - 106
20919428086
21019621216
211291121 - 170
212190189 - 1
213247119 - 128
214286126 - 160
21521952 - 167
216229170 - 59
21723230
21823627741
21923526530
220252204 - 48
22125828123
2229984 - 15
22321122615
224279272 - 7
22521626751
226167116 - 51
22724327532
228262220 - 42
229230154 - 76
230288166 - 122
231337251 - 86
232269228 - 41
23323325219
23420208188
23511018575
236308284 - 24
237261231 - 30
238245127 - 118
239343283 - 60
24016070 - 90
241150264114
2428515671
2435173168
244347213 - 134
24515320350
24617521944
247339298 - 41
248134290156
24920342 - 161
250348282 - 66
251289112 - 177
25212416844
25321823820
25414520762
255119113 - 6
256183123 - 60
257299236 - 63
2587362 - 11
259302261 - 41
260299667
26157570
26217322552
263281255 - 26
264319968
265275270 - 5
266319296 - 23
26718124564
268178301123
2698910819
270307291 - 16
271294229 - 65
27221543 - 172
273176160 - 16
274270180 - 90
275240195 - 45
27621726346
277318302 - 16
278316303 - 13
279285237 - 48
28028029717
28117772 - 105
28220723427
283314217 - 97
284309232 - 77
285306253 - 53
28624829244
28719328996
288296286 - 10
28922728558
29017122251
291345304 - 41
292305262 - 43
29371268197
2942642662
2952262359
296225209 - 16
29727329522
29824426016
299142305163
30020930697
301313307 - 6
It looks like the race leaders were fairly accurate in their predictions, while the differences start to become greater around 40th place or so.
BarChart[swimΔ, FrameLabel->{"Overall Place", "Swim Δ"}, Frame->{True, True, False, False}]
2009-05-05-TriTheIllini1
From the sorted Δs it looks like about half of the participants were within 50 places or so of their seeds, while a few were way off (in both directions).
BarChart[Sort[swimΔ]]
2009-05-05-TriTheIllini2
Median[Abs@swimΔ]
41
N[Mean[Abs@swimΔ]]
56.70099667774086`
N[StandardDeviation[Abs@swimΔ]]
51.80100030247153`
Commonest[Abs@swimΔ]
{16}
tally = Sort[Tally[Abs@swimΔ]]
{{0, 5}, {1, 3}, {2, 4}, {3, 3}, {4, 1}, {5, 6}, {6, 6}, {7, 6}, {8, 5}, {9, 4}, {10, 5}, {11, 1}, {12, 5}, {13, 3}, {14, 4}, {15, 5}, {16, 10}, {17, 1}, {18, 4}, {19, 3}, {20, 5}, {21, 4}, {22, 6}, {23, 4}, {24, 3}, {25, 1}, {26, 5}, {27, 2}, {28, 4}, {30, 6}, {31, 2}, {32, 4}, {33, 2}, {34, 6}, {35, 4}, {36, 2}, {37, 2}, {41, 5}, {42, 2}, {43, 2}, {44, 3}, {45, 3}, {46, 2}, {47, 2}, {48, 3}, {49, 3}, {50, 3}, {51, 5}, {52, 3}, {53, 1}, {54, 1}, {55, 1}, {56, 3}, {57, 4}, {58, 2}, {59, 2}, {60, 2}, {62, 4}, {63, 3}, {64, 1}, {65, 2}, {66, 2}, {67, 4}, {68, 1}, {69, 1}, {70, 1}, {71, 2}, {72, 1}, {75, 2}, {76, 1}, {77, 1}, {79, 2}, {80, 1}, {83, 2}, {86, 3}, {88, 1}, {90, 5}, {94, 1}, {96, 1}, {97, 2}, {101, 1}, {105, 1}, {106, 2}, {108, 1}, {111, 1}, {113, 1}, {114, 3}, {116, 1}, {117, 1}, {118, 1}, {121, 2}, {122, 3}, {123, 4}, {125, 1}, {128, 1}, {130, 1}, {131, 1}, {133, 1}, {134, 1}, {135, 2}, {136, 2}, {141, 1}, {143, 1}, {146, 1}, {151, 1}, {156, 1}, {157, 1}, {158, 1}, {159, 1}, {160, 1}, {161, 1}, {163, 1}, {167, 1}, {168, 2}, {170, 1}, {172, 1}, {174, 1}, {177, 1}, {179, 1}, {186, 1}, {188, 1}, {194, 1}, {195, 1}, {197, 1}, {205, 1}, {226, 1}, {277, 1}}
BarChart[Range[0, Max[tally[[All, 1]]]]/.Append[Apply[Rule, tally, 1], _Integer->0], FrameLabel->{TraditionalForm[Abs["swimΔ"]], "Count"}, Frame->{True, True, False, False}]
2009-05-05-TriTheIllini3
It looks like most people were 40-50 places off (in one direction or the other) from their seed. This is higher than I would have expected. The most common difference was 16 places. There must have been a lot of passing going on.

2 comments:

Scott said...

Interesting. I'd speculate that there are a few factors behind the differences in accuracy between top finishers and the rest. First is the obvious - faster racers (and better swimmers) probably have a better idea of what they can do. They most likely train more and pay attention to the clock. Second might be the combined effect of there being very few really strong swimmers and that most people probably rounded their estimate to the nearest 10, 15, 30, or even 60 seconds. This would result in many subgroups of people (particularly around the mode, which I am guessing included average-to-slow swimmers) that provided the exact same estimate, making the start order random within each subgroup. Of course, if their estimates are still reasonably accurate, one would expect that the 10 second buffer would keep passing to a minimum. Third, with such a short swim, there were probably many clusters of swimmers with very close actual swim times, but separated by many ordinal places in swim time. In this case, one could have an accurate estimate but still have a high swim place delta. I would expect this effect to be greatest for swimmers near the median. None of this explains the guy in front of me that was walking through the shallows. Congratulations on your delta of 2!

Mobaar said...

I'm curious though...if you start 10s behind someone but end up only 5s behind that person at the end of the leg, wouldn't your swim place be higher but you did NOT pass the person? Something I would want to have explained. Thanks!