Beyond your script

An introduction to collaborative coding and software engineering good practice

Alessandro Felder, Laura Porta, Chang Huan Lo

Get ready!

Photo by Alice Donovan Rouse on Unsplash

open these slides on your laptop: https://neuroinformatics.dev/course-software-good-practice/

What makes (research) code good?

Mentimeter

The next steps

import pandas
import numpy

def do_something_awesome(input):
  p = process(input)
  a = analyze(p)
  g = plot(a)
  output = g, a
  return output
  
if __name__ == "__main__":
  print(do_something_awesome(42))

The next steps

pip install my_awesome_package

The journey of a developer and its Python package

Clean up and organize your scripts 🧹
Package your project 📦
Share your project 🌍

Clean up and organize your scripts 🧹

A common scenario

Image by PhD comics

A common scenario

Image by PhD comics

A common scenario

Image by PhD comics

The messy reality

You’ve written your analysis script, and it “works”… but it’s chaotic:

Hardcoded numbers, file paths, and cryptic names are everywhere
Analysis logic is scattered and hard to follow
Plotting functions are tangled with data processing

Don’t move on to the next project! Your work isn’t done yet.

What might be not-so-good about these two functions?

def calculate_fastest_time(time_list):
    fastest_time = time_list[0]
    for time in time_list:
        if time > fastest_time:
            fastest_time = time
    return fastest_time


def print_fastest_time(time_list):
    fastest_time = time_list[0]
    for time in time_list:
        if time > fastest_time:
            fastest_time = time

    print(f"This is the fastest time: {fastest_time} 🚀")

What might be not-so-good about these two functions?

Mentimeter

Code smells

A code smell is a surface indication that usually corresponds to a deeper problem in the system.

Martin Fowler

Which code smells can we find here?

my_project/
└── do_all_analysis.py

a = 500 
b = 0.75 

# from michael old analysis pipeline
def process_data(T, sr, t):
    s = [x * t for i, x in enumerate(T.iloc[:, 0]) if (x % t) > sr and i % 3 == 0]
    return sorted(s, reverse=True)[:len(s)//2] if len(s) > 5 else s

import pandas as pd
data = pd.read_csv("/Users/helen/Desktop/29104629.csv")

spikes = process_data(data, a, b)

Which code smells can we find here?

Mentimeter

Improving the code

import pandas as pd


# Constants for sampling rate and spike threshold
SAMPLING_RATE = 500  # Hz
SPIKE_THRESHOLD = 0.75  # Arbitrary units

# Load the dataset
# Avoid hardcoded paths and non-meaningful filenames; use a descriptive path.
data_path = input()
data = pd.read_csv(data_path)

# Refactored spike detection function with clear logic and meaningful names
def detect_spikes(data, sampling_rate, threshold):
    spike_times = []
    for i, value in enumerate(data.iloc[:, 0]):
        # Detect spikes based on threshold and other criteria
        if value > threshold and i % 3 == 0:
            spike_times.append(value)

    # Return the list of detected spikes, sorted in descending order
    return sorted(spike_times, reverse=True)

# Detect spikes using the provided sampling rate and threshold
spikes = detect_spikes(data, SAMPLING_RATE, SPIKE_THRESHOLD)

Organize your functions

small
do one thing
use descriptive names
limit amount of arguments

Make your code modular

my_pipeline/
└── preprocess.py
└── analysis.py
└── plot.py
└── read_write.py
└── my_analysis_pipeline.py

An example pipeline

Import your functions and classes in your main analysis pipeline

# file: my_analysis_pipeline.py
from my_project.read_write import load_calcium_data
from my_project.preprocess import extract_fluorescence
from my_project.analysis import analyze_calcium_timeseries
from my_project.plot import make_figures_for_paper

def my_calcium_imaging_pipeline():
    raw_data = load_calcium_data()
    delta_fluorescence = extract_fluorescence(raw_data)
    spike_times_table = analyze_calcium_timeseries(delta_fluorescence)
    make_figures_for_paper(spike_times_table)

    print("Completed")
  
if __name__ == "__main__":
    my_calcium_imaging_pipeline()

To learn more…

Suggested resources:
- Clean Code by Robert C. Martin

Next steps

Now that your logic is split into modules and your pipeline is well crafted you might want to move to the next step…

Package your project 📦

How can I call my functions from a different project?

my_pipeline/
└── preprocess.py
└── analysis.py
└── plot.py
└── read_write.py
└── my_analysis_pipeline.py

How can I call my functions from a different project?

Mentimeter

Demo! 🥺

What does packaging mean?

Packaging allows Python to “install” your code so that it can be re-used from anywhere.
Local, editable installation from the package directory:

pip install -e .

Demo! 😊

The Python package structure

my-awesome-package/
    ├── LICENSE
    ├── README.md
    ├── pyproject.toml 👈
    ├── ...
    ├── my_awesome_package/
    │   └── __init__.py 👈
    │   └── do_something_awesome.py

Automating package creation

Use the NIU cookiecutter template.

conda create -n "package-playground"
conda activate package-playground
pip install cookiecutter
cookiecutter https://github.com/neuroinformatics-unit/python-cookiecutter

Demo! 📜

The Python Package Index (PyPi)

Once it’s ready, publish your package.
Python packaging tutorial

This structure will also come in handy when you want to distribute your package widely.

Your package is installable by anyone

pip install my_awesome_package

Summary

└── my-awesome-package/
    ├── LICENSE
    ├── MANIFEST.in
    ├── README.md
    ├── pyproject.toml
    ├── tox.ini
    ├── docs/
    ├── my_awesome_package/
    │   └── __init__.py
    │   └── do_something_awesome.py
    └── tests/
        ├── __init__.py
        └── test_placeholder.py

Conclusion

What we have learned today

Clean up and organize your scripts 🧹
Package your project 📦
Share your project 🌍

Other areas of good practice

testing
writing documentation
asynchronous collaboration (issues and pull requests)

Extra resources

If you want to practice refactoring, testing and documenting we have prepared some exercises

Retrospective

Anonymously tell us what you thought on this ideaboardz…
… and get in touch anytime!

Beyond your script

Get ready!

What makes (research) code good?

The next steps

The next steps

The journey of a developer and its Python package

Clean up and organize your scripts 🧹

A common scenario

A common scenario

A common scenario

The messy reality

What might be not-so-good about these two functions?

What might be not-so-good about these two functions?

Code smells

Which code smells can we find here?

Which code smells can we find here?

Improving the code

Organize your functions

Make your code modular

An example pipeline

To learn more…

Next steps

Package your project 📦

How can I call my functions from a different project?

How can I call my functions from a different project?

Demo! 🥺

What does packaging mean?

Demo! 😊

The Python package structure

Automating package creation

Demo! 📜

Share your project 🌍

The Python Package Index (PyPi)

Your package is installable by anyone

Summary

Conclusion

What we have learned today

Other areas of good practice

Extra resources

Retrospective