photon-mosaic

Progress, UI Ideas and dev strategies

Laura Porta

Project summary

photon-mosaic is a pipeline and toolkit for automating preprocessing of multi-photon calcium imaging data.

It was designed to:

Standardize steps like contrast adjustment, suite2p, neuropil correction, etc.
Enable reproducibility via Snakemake
Be modular and extensible
Reduce manual clicks

What I want to generalise?

Estimated number of clicks per processing step per dataset: 5-30

Why Snakemake?

Snakemake helps automate reproducible workflows.

Instead of writing:

python step1.py && python step2.py && ...

You declare:

rule derotate:
  input: "raw.tif"
  output: "derotated.tif"
  shell: "python derotate.py {input} {output}"

Why Snakemake?

Snakemake tracks dependencies & reruns only what changed.

What could change?

Input files
Parameters
Code

Snakemake = DAG of Tasks

Each rule specifies:

Inputs: files that must exist before running
Outputs: files produced by the rule
Scripts: that do the actual work

Snakemake = DAG of Tasks

Snakemake builds a DAG (Directed Acyclic Graph) from these dependencies.

Only changed or missing outputs will be recomputed.

Modular Design

Organization of photon-mosaic

Modular Design

Flexible rules via “registration” of different functions:

preprocessing: derotation, contrast enhancement, etc.
postprocessing: detrending, neuropil correction, etc.
user-defined functions

Snakemake Rule Example

rule suite2p:
  input:
    tiff=lambda wc: str(...)
  output:
    F=".../F.npy",
    bin=".../data.bin"
  run:
    run_suite2p(input.tiff, output.F, output.bin, ...)

This uses logic and config patterns to determine filenames from the folder structure.

Example construction of a DAG and run

Force run the preprocessing rule triggers the following DAG:

job              count
-------------  -------
all                  1
preprocessing       40
suite2p             40
total               81

Reasons:
    (check individual jobs above for details)
    forced:
        preprocessing
    input files updated by another job:
        all, suite2p

Enforcing a Shared Folder Structure

We follow the NeuroBlueprint data standard when writing our Snakemake rules.

Can read from any folder structure using glob patterns to find your dataset folders and tiff files
But enforces a specific output folder structure

Output Standardization

Enforces the standard folder hierarchy:
- derivatives/ for processed data
- sub-XXX/ for subjects
- ses-YY/ for sessions
- funcimg/ for functional imaging data

Implementation

# Dynamic path generation in Snakemake rules
processed_data_base / f"sub-{i}_{new}" / f"ses-{j}" / "funcimg" / f"{name}.tif"

This approach:

Keeps your Snakefile clean and maintainable
Makes outputs consistent and predictable
Enables easy integration with other tools

Example derivatives folder

sub-0_230801CAA1120181
└── ses-0
    └── funcimg
        ├── derotated_full.tif
        └── suite2p
            └── plane0
                ├── F.npy
                └── ...
└── ses-1
    └── ...
sub-1_230802CAA1120182
└── ses-0
    └── ...
└── ses-1
    └── ...

Current Status

✅ Every step launches a job on the cluster
✅ Suite2p integration (via Snakemake rules)
✅ Tests for core components
✅ Docs with sphinx
☑️ Read arbitrary folder structure
☑️ Preprocessing steps
💡 Idea for interactive terminal UI in progress

Challenge: Complexity for Users

Snakemake is powerful… but YAML, paths, and rule syntax can scare users.

Hard to onboard new users
Many researchers prefer GUIs or simple CLI tools
Need a gentler interface

Proposed Solution: Terminal UI

A TUI (terminal user interface) could help:

Connect to the cluster 🤔
Select pipeline steps interactively
Auto-generate Snakemake config and edit it
Submit jobs or monitor execution
Debug failed rules with context

Feature: Add Custom User Scripts

In the TUI, we could let users:

Select a .py or .sh file
Specify input/output pattern
Register it as a new step

This allows non-developers to integrate their analysis into the pipeline without editing Snakefiles manually.

TUI Concept Sketch

Possible TUI

What’s Next?

Complete PR with preprocessing steps and dataset discovery
Make a TUI
Reformulate the docs
Add postprocessing steps

Thanks!

📦 photon-mosaic

Appendix

Building with ChatGPT and Cursor.

Learning, Drafting, and Prototyping

LLMs accelerated my work on photon-mosaic by:

Learning through conversation
- e.g. prompted ideas on how to register preprocessing steps
Generating boilerplate from structure
- turned documentaion folder notes into a draft Quarto presentation
From sketch to script
- pen-and-paper ideas → prompt draft (ChatGPT) → Cursor-almost-ready script

But There Are Limits…

LLMs struggle as complexity grows:

Snakemake integration
- Spent hours debugging path resolution across rules
- LLM hallucinated or misunderstood rule linking
Custom scripts
- LLMs helped draft but failed in having a deep understanding of the core scientific problem
- Bottleneck: my own understanding of edge cases + ability to guide it precisely
Surprisingly poor at refactoring and understanding changes in code

What’s the Future of Development?

After how long should I stop explaining to an LLM what I want?
How can we build more better and faster tools by leveraging LLMs?
How has your workflow changed?

photon-mosaic

Project summary

Reduce manual clicks

What I want to generalise?

Why Snakemake?

Why Snakemake?

Snakemake = DAG of Tasks

Snakemake = DAG of Tasks

Modular Design

Modular Design

Snakemake Rule Example

Example construction of a DAG and run

Enforcing a Shared Folder Structure

Output Standardization

Implementation

Example derivatives folder

Current Status

Challenge: Complexity for Users

Proposed Solution: Terminal UI

Feature: Add Custom User Scripts

TUI Concept Sketch

What’s Next?

Thanks!

Appendix

Learning, Drafting, and Prototyping

But There Are Limits…

What’s the Future of Development?