Bridging Bioimaging and Research Software Engineering
AM (technical)
PM (community)
Find these slides at https://neuroinformatics.dev/slides-big-imaging-data-osw25/.
Thank you to HEFTIE textbook authors: David Stansby, Ruaridh Gollifer and Kimberly Meechan!
bioformats
OME-zarr
bioformats
can help convert from proprietaryLet’s dig deeper
zarr
is an open-source specification for how a large N-dimensional arrays should be stored.
where should we store the data?
how big should we make the chunks?
how should we compress each chunk?
Criteria:
Luckily, Ruaridh and Kimberly can help!
Open-source zarr-benchmarks repository which is freely available
Python based using pytest-benchmark
SOON - a report summarising findings with plots, but still in progress
3 images: heart (335 MB), dense segmentation, sparse segmentation
All sized 806 x 629 x 629
All 16-bit unsigned integer
3 images: heart (335 MB), dense segmentation, sparse segmentation
All sized 806 x 629 x 629
All 16-bit unsigned integer
3 images: heart (335 MB), dense segmentation, sparse segmentation
All sized 806 x 629 x 629
All 16-bit unsigned integer
Choice of compression library affects read time more than compression level.
Choice of compression library affects write time + higher compression levels take longer to write.
Tensorstore is a lot faster at reading data than zarr-python.
Large chunks compress worse + increase memory usage
Larger chunks are faster to read/write overall (+ make fewer files)
Compression ratio vs write time plot for heart image (left) and dense segmentation (right)
Segmentations normally compress a lot more (see compression ratio, y axis much higher values)
You may have to use different settings depending on your image.
Approximately “optimal” choices*
tensorstore
library for fastest read/writeblosc-zstd
is a good default compressorThis is different for different data! Worth testing some small samples of your own data with different settings
Chunks help with reading subsets of pixels, but what it you want to view the image as whole?
Idea: multiscale images!
Even better idea: multiscale images that follow a standard.
zarr
v2 versus v3“The colourful mouse bone challenge”
Label a mouse tibia by chunks
.zattr
and .zgroup
filesYou may additionally need a reader plugin for your specific image data, e.g. if you have sldy
images
Run on small data first!
Run on small data first
or download from https://github.com/neuroinformatics-unit/slides-big-imaging-data-osw25/blob/main/tutorials/ and then run
from the tutorials/
folder in your terminal.
e.g. the IARPA MICrONS dataset
This IARPA MICrONS dataset spans a 1.4mm x .87mm x .84 mm volume of cortex in a P87 mouse. The dataset was imaged using two-photon microscopy, microCT, and serial electron microscopy, and then reconstructed using a combination of AI and human proofreading.
Madeline Lancaster, a neuroscientist at the University of Cambridge, UK, can relate to that. In July, she received a total of 36 applications for a postdoctoral position in her laboratory, many fewer than the couple of hundred that she originally expected. “I had been nervous that I wouldn’t be able to go through all of the applications,” she says. Those 36 didn’t lead to a single appointment. “I still have not filled the position,” she says. “There seems to be lots of competition for strong candidates.” 1
Those who stayed and landed a coveted faculty position were more likely to have had a highly cited paper, changed their research topic between their PhD and postdoc, or moved abroad after receiving their doctorate. 1
SSI Collaborations Workshop
Imaging Conferences
Peter Sieling, CC BY 2.0.
A bridge between Bioimaging and Research Software Engineering
RTP=“(digital) Research Technology Professional”
The job framework, consists of several individual job descriptions ranging from:
The framework allows clear definition and development of each role,…
a mixture of service delivery, research, innovation, and teaching activities according to your own preferences and skills, and appropriate to your level of seniority.
Research Software Developers, Research Infrastructure Developers, Research Data Stewards, and Research Data Scientists – knowing that these are fluid categories, and welcome those who cross the boundaries between these.
We consider that bioimaging involves four different types of expertise.
- Life Scientists (e.g. Biologists) …
- Instrumentalists (e.g. Microscopists) …
- Developers (e.g. Image processing algorithm developers, programmers and computer scientists) …
- Bioimage analysts are a new type of experts in BioImaging, they select appropriate image processing algorithms and their implementations, and assemble them for conducting practical Bioimage Analysis.
One of the aim of NEUBIAS is to explicitly promote the mutual communication between these four communities of experts and to establish the role of Bioimage Analysts in Life Science
Are Bioimage Analysts a specialist/domain-specific RTP?
“Speedblogging” - an “evolved” way of writing up discussion notes from small groups * Split up in small groups and discuss * Write up notes together
Speedblogging * choose question of interest * self-organise into groups * assign a chair and a scribe * chair ensures everyone gets opportunity to contribute * scribe takes notes
::: https://www.software.ac.uk/guide/speed-blogging-and-tips-writing-speed-blog-post :::
::: https://www.software.ac.uk/guide/speed-blogging-and-tips-writing-speed-blog-post :::
These could include all or some of: * Why is this issue important? * Summary of the discussion topic * Recommendations * Other significant points outside the main topic of discussion ::: https://www.software.ac.uk/guide/speed-blogging-and-tips-writing-speed-blog-post :::
Big Imaging Data | NIU Open Software Week | 2025-08-14