Sport-Analytics with YOLO et al.

4 min readFeb 12, 2023

YOLO is truly amazing and allows a working out-of-the-box solution for various real-time detection problems, even with minimal programming skills. YOLOv8 is the state-of-the-art version of YOLO family, released earlier this year (Jan’23) and already appears to emerge rapidly.

However, many complex problems only start with YOLO and require additional processing layers on-top of the objects-detection that it provides. One common example is objects-tracking, aka objects-registration within this context, i.e. processing a video-stream and keeping track of each detected objects uniquely among the incoming frames.

The objects-tracking performance appears to be highly sensitive with the scene setup (illumination, overlaps, etc.) and with the video characteristics (fps, stride, resolution, etc.). Moreover, objects might go in and out from the frame, so that there might be incontinous detection of multiple objects. That might be the reason that only few frameworks try to tackle this problem, e.g. ByteTracker, and neither appears to act as a complete solution. It therefore appears that each specific problem requires its own special sauce of objects-tracking handling.

Sport-Analytics is an example for such complex problems. It has been addressed by few frameworks, e.g. RoboFlow, but the problems is yet considered ‘open’, mainly as the tracking doesn’t perform very rigidly.

Sport-Analytics may be addressed with the following architecture scheme:

The overall pipelines is described hereby below:

Fetch a new frame from the video source
Downscale, e.g. 640x640 for YOLOv5, 608x608 for YOLOv8, etc.
Objects-Detections ➔ apply YOLO for detecting the players within the given frame, ends up with list of bounding-boxes, per each player.
Camera Movement Compensation, aka Motion Stabilization, for accommodating moving-camera artifacts. That can be achieved by applying features point detector (CV2::goodFeaturesToTrack) followed by OpticalFlow calculations (CV2::calcOpticalFlowPyrLK). From there, it’s possible to extract the 2D Transformation Matrix and to apply the respective warp (CV2::estimateAffinePartial2D and CV2::warpAffine).
Player Registration, aka tracking ➔ that’s the most exhaustive part of the pipeline, which starts from enfolding each detected player into a designated Player class, and ends up with a Players-Similarity analysis, i.e. analyzing which previous player might be the best match for each of the players within the given frame. The similarly metric must be efficient in terms of performance, i.e. finding the a sweet-point between accuracy and complexity/runtime. One common metric could be L2 distance of the weighted-mean of Location, Color, Size, and Aspect-Ration (more complex metric such as MSSIM might penalized with a too high runtime). The best color-space might vary across the specific problem, and it’s important to explore that area. It’s also useful to remove the background before sampling the player’s color, e.g. with useful rembg library.
Crowd Filtering ➔ rejecting players detections which does not respect a “true player” characteristics, i.e. size, speed, color, number of neighbors, etc. The exact definition might vary per specific problem, so it’s important to build this part adaptively and parametrically as possible. It’s also import to explore the built-in trade-off between True-Positives (true crowd) and False-Positives (players which falsely marked as crowd) and to find a working-point which ensures a high Precision.
Overlaps Update ➔ Players which existed in previous frames but missing within the current frame are considered ”Missing”. They could generally turned-up missing due to miss-detection (YOLO), overlap with other players or out-of-the-frame event. A designate logic distinguishes between these 3 cases and updated the Player objects accordingly. In case of an overlap lap, the foreground player will hold the overlapped player at its background.
Conflicts Resolve ➔ Handle cases where the Players Registration ends up with duplicate IDs, i.e. multiple Players share the same ID. In such cases, the one with the highest similarity score keeps its ID while the other Players turns into a fallback, i.e. a sub-optimal registration (picking the 2nd best registration solution per conflict).

Following is an example for the Camera Movement Compensation phase:

Camera Movement Compensation (aka Motion Stabilization) example

Following is an example for the Players Resigtration phase:

Finally, the processed frame ends-up with registered players metadata which is ready for publish. The data can then be written into a Postgres database for a realtime/offline playback by designated player at the frontend.

The overall accuracy of the Sports-Analytics processing can be analyzed with the common Precision, Recall and F1 metrics, with the following terminology:

True Positive (TP) = Correct detection and a Correct ID
False Positive (FP) = Incorrect detection (e.g. Crowd, Referee, etc.) or ID
False Negative (FN) = Miss Detection

Precision, Recall and F1 are then defined as:

Precision (P) = TP / TP + FP
Recall (R) = TP / TP + FN
F1 = 2 * P * R / (P+R).

A common rule-of-thumb is to reach a PoC level at ~0.82 F1 with R=~0.8, and P=~0.85 and to reach a production level at ~0.92 F1 with R=~0.9, and P=~0.95. The exact requirement is of-course project dependent.

Sport-Analytics with YOLO et al.

Written by Shahar Gino

No responses yet