GSoC NIU Projects 2025: `ethology`#

If you are interested in any of these ethology projects, get in touch! Feel free to open a new topic on our Zulip GSoC channel and ask the community.

Our working language is English, but our mentors for these projects also speak Spanish.

Support for any-point trackers in ethology

The main goal of ethology is to facilitate the application of a wide range of computer vision tasks to animal behaviour research, by providing a unified data analysis interface across these tasks.

Any-point tracking is a good example of a computer vision task that is maturing within the field of computer vision, but it still relatively inaccessible to animal behaviour researchers. The task consists on the following: given a video and a set of query points on a frame of that video, predict the location of those points on every other frame of the video. This is a more general problem than the pose estimation one, which typically focuses on predicting the location of a fixed set of keypoints on an animal’s body. As a result, any-point tracking tools could prove to be a very valuable tool for studying animal behaviour, with potential to supplement or potentially even replace pose estimation.

Depending on the quality of the trajectories generated by any-point trackers, these could be useful to study the movement patterns of animals directly, or they may be more helpful as a semi-automatic way to quickly generate labelled data. In recent years, there has been an increase in the development of any-point trackers, such as cotracker3 and TAPIR. The goal of this project is to add support for any-point trackers to ethology, so that users can easily apply these tools to their data and analyse the trajectories generated.

Deliverables

The expected deliverables include:

A prototype napari widget, that allows the user to load a video and select the query points to track.
Back-end functionality to read the query points from the napari widget, and run inference on a pre-trained any-point tracker model, such as those provided by cotracker3 or TAPIR.
Ability to read the generated trajectories as a movement dataset.
Front-end support on the napari widget to overlay the trajectories generated by the any-point tracker on the video.
As a stretch goal, the widget could be extended to allow the user to run inference directly from the napari GUI.

Duration

Large (~350 hours)

Difficulty

This project is well-suited for an intermediate or advanced contributor to open source.

Required skills

Experience with Python and PyTorch.

Nice-to-haves

Experience developing or using napari plugins.
Experience with computer vision applications, particularly pose estimation and any-point tracking approaches.

Potential mentors

Further reading

cotracker3 paper and code
TAPIR paper and code
napari usage tutorials and contributing guide.
napari Plugin documentation, particularly the section on Building a plugin.

Support for low-shot detectors in ethology

Low-shot detection is a computer vision task that aims to detect objects in images with very few labelled examples (between 0 and 5). This task can be very useful to researchers in animal behaviour, but it is still somewhat inaccessible to them. For example, it could be particularly useful for collective animal behaviour research, where it may be tedious to obtain large labelled datasets for training accurate detection models. It could also be useful for labelling large datasets collected in the lab, where the collected frames are relatively similar in appearance, as it is common to record videos of animals with static cameras and relatively uniform backgrounds.

The goal of this project is to support few-shot detection in ethology, initially as a semi-automatic approach to labelling bounding boxes. The workflow could look something like this: given a set of frames extracted from a video, the user would load these files into a GUI (likely a napari widget) and use the GUI annotation tools to manually draw bounding boxes around the object they want to detect. These annotations define the visual prompts, and could be drawn on a single or multiple frames. Then the user would select a low-shot detector model to label the rest of identical instances in the dataset. After the model has labelled the dataset, the user should be able to review the results and correct any mistakes. The final output would be a bounding boxes annotation object, which could then be used to create a detection dataset for training a supervised learning model.