photon-mosaic

Progress, UI Ideas and dev strategies

Laura Porta

Project summary

photon-mosaic is a pipeline and toolkit for automating preprocessing of multi-photon calcium imaging data.

It was designed to:

  • Standardize steps like contrast adjustment, suite2p, neuropil correction, etc.
  • Enable reproducibility via Snakemake
  • Be modular and extensible
  • Reduce manual clicks

What I want to generalise?

Estimated number of clicks per processing step per dataset: 5-30

Why Snakemake?

Snakemake helps automate reproducible workflows.

Instead of writing:

python step1.py && python step2.py && ...

You declare:

rule derotate:
  input: "raw.tif"
  output: "derotated.tif"
  shell: "python derotate.py {input} {output}"

Why Snakemake?

Snakemake tracks dependencies & reruns only what changed.

What could change?

  • Input files
  • Parameters
  • Code

Snakemake = DAG of Tasks

Each rule specifies:

  • Inputs: files that must exist before running
  • Outputs: files produced by the rule
  • Scripts: that do the actual work

Snakemake = DAG of Tasks

Snakemake builds a DAG (Directed Acyclic Graph) from these dependencies.

Only changed or missing outputs will be recomputed.

Modular Design

Organization of photon-mosaic

Modular Design

Flexible rules via “registration” of different functions:

  • preprocessing: derotation, contrast enhancement, etc.

  • postprocessing: detrending, neuropil correction, etc.

  • user-defined functions

Snakemake Rule Example

rule suite2p:
  input:
    tiff=lambda wc: str(...)
  output:
    F=".../F.npy",
    bin=".../data.bin"
  run:
    run_suite2p(input.tiff, output.F, output.bin, ...)

This uses logic and config patterns to determine filenames from the folder structure.

Example construction of a DAG and run

Force run the preprocessing rule triggers the following DAG:

job              count
-------------  -------
all                  1
preprocessing       40
suite2p             40
total               81

Reasons:
    (check individual jobs above for details)
    forced:
        preprocessing
    input files updated by another job:
        all, suite2p

Enforcing a Shared Folder Structure

We follow the NeuroBlueprint data standard when writing our Snakemake rules.

  • Can read from any folder structure using glob patterns to find your dataset folders and tiff files
  • But enforces a specific output folder structure

Output Standardization

  • Enforces the standard folder hierarchy:
    • derivatives/ for processed data
    • sub-XXX/ for subjects
    • ses-YY/ for sessions
    • funcimg/ for functional imaging data

Implementation

# Dynamic path generation in Snakemake rules
processed_data_base / f"sub-{i}_{new}" / f"ses-{j}" / "funcimg" / f"{name}.tif"

This approach:

  • Keeps your Snakefile clean and maintainable
  • Makes outputs consistent and predictable
  • Enables easy integration with other tools

Example derivatives folder

sub-0_230801CAA1120181
└── ses-0
    └── funcimg
        ├── derotated_full.tif
        └── suite2p
            └── plane0
                ├── F.npy
                └── ...
└── ses-1
    └── ...
sub-1_230802CAA1120182
└── ses-0
    └── ...
└── ses-1
    └── ...

Current Status

  • ✅ Every step launches a job on the cluster
  • ✅ Suite2p integration (via Snakemake rules)
  • ✅ Tests for core components
  • ✅ Docs with sphinx
  • ☑️ Read arbitrary folder structure
  • ☑️ Preprocessing steps
  • 💡 Idea for interactive terminal UI in progress

Challenge: Complexity for Users

Snakemake is powerful… but YAML, paths, and rule syntax can scare users.

  • Hard to onboard new users
  • Many researchers prefer GUIs or simple CLI tools
  • Need a gentler interface

Proposed Solution: Terminal UI

A TUI (terminal user interface) could help:

  • Connect to the cluster 🤔
  • Select pipeline steps interactively
  • Auto-generate Snakemake config and edit it
  • Submit jobs or monitor execution
  • Debug failed rules with context

Feature: Add Custom User Scripts

In the TUI, we could let users:

  1. Select a .py or .sh file
  2. Specify input/output pattern
  3. Register it as a new step

This allows non-developers to integrate their analysis into the pipeline without editing Snakefiles manually.

TUI Concept Sketch

Possible TUI

What’s Next?

Thanks!

📦 photon-mosaic

Appendix

Building with ChatGPT and Cursor.

Learning, Drafting, and Prototyping

LLMs accelerated my work on photon-mosaic by:

  1. Learning through conversation
    • e.g. prompted ideas on how to register preprocessing steps
  2. Generating boilerplate from structure
    • turned documentaion folder notes into a draft Quarto presentation
  3. From sketch to script
    • pen-and-paper ideas → prompt draft (ChatGPT) → Cursor-almost-ready script

But There Are Limits…

LLMs struggle as complexity grows:

  • Snakemake integration
    • Spent hours debugging path resolution across rules
    • LLM hallucinated or misunderstood rule linking
  • Custom scripts
    • LLMs helped draft but failed in having a deep understanding of the core scientific problem
    • Bottleneck: my own understanding of edge cases + ability to guide it precisely
  • Surprisingly poor at refactoring and understanding changes in code

What’s the Future of Development?

  • After how long should I stop explaining to an LLM what I want?
  • How can we build more better and faster tools by leveraging LLMs?
  • How has your workflow changed?