Large Language Models on Race Commentary: Towards Granular Data in Cycling Analytics

Janssens, BramBramJanssensBogaert, MatthiasMatthiasBogaertVerstockt, StevenStevenVerstockt2025-05-122025-05-112025-05-122025978-3-031-86691-31865-0929https://imec-publications.be/handle/20.500.12860/45621Current cycling analytics studies are limited to data about the eventual race results. This study searches how online commentary can be used to capture information about in-race dynamics by harnessing the power of large language models. The results show that the direct application of these models is already promising but not accurate enough to base end-to-end machine learning applications on the generated data. Our results show the tendency of these models to use information from previous queries in its generation step, which indicates data leakage and might hamper the scientific validation of approaches comparing various techniques. To capture overall rider behavior we suggest using graph representation learning. Our results indicate that this method is capable of identifying similar rider behavior, which to date was not yet feasible.Large Language Models on Race Commentary: Towards Granular Data in Cycling AnalyticsProceedings paper10.1007/978-3-031-86692-0_2978-3-031-86692-0WOS:001473398800002