Beyond your script

An introduction to collaborative coding and software engineering good practice

Alessandro Felder, SofΓ­a MiΓ±ano, Chang Huan Lo

Get ready!

What makes (research) code good?

Mentimeter

The next steps

Photo by Jukan Tateisi on Unsplash
import pandas
import numpy

def do_something_awesome(input):
  p = process(input)
  a = analyze(p)
  g = plot(a)
  output = g, a
  return output
  
if __name__ == "__main__":
  print(do_something_awesome(42))

The next steps

Photo by Jukan Tateisi on Unsplash
pip install my_awesome_package

The journey of a developer and their Python package

  • Clean up and organize your scripts 🧹
  • Package your project πŸ“¦
  • Share your project 🌍

Clean up and organize your scripts 🧹

A common scenario

Image by PhD comics

A common scenario

Image by PhD comics

A common scenario

Image by PhD comics

The messy reality

You’ve written your analysis script, and it β€œworks”… but it’s chaotic:

  • Hardcoded numbers, file paths, and cryptic names are everywhere
  • Analysis logic is scattered and hard to follow
  • Plotting functions are tangled with data processing

Don’t move on to the next project! Your work isn’t done yet.

What might be not-so-good about these two functions?

def calculate_fastest_time(time_list):
    fastest_time = time_list[0]
    for time in time_list:
        if time > fastest_time:
            fastest_time = time
    return fastest_time


def print_fastest_time(time_list):
    fastest_time = time_list[0]
    for time in time_list:
        if time > fastest_time:
            fastest_time = time

    print(f"This is the fastest time: {fastest_time} πŸš€")

What might be not-so-good about these two functions?

Mentimeter

Code smells

A code smell is a surface indication that usually corresponds to a deeper problem in the system.

  • Martin Fowler

Which code smells can we find here?

my_project/
└── do_all_analysis.py

a = 500 
b = 0.75 

# from michael old analysis pipeline
def process_data(T, sr, t):
    s = [x * t for i, x in enumerate(T.iloc[:, 0]) if (x % t) > sr and i % 3 == 0]
    return sorted(s, reverse=True)[:len(s)//2] if len(s) > 5 else s

import pandas as pd
data = pd.read_csv("/Users/helen/Desktop/29104629.csv")

spikes = process_data(data, a, b)

Which code smells can we find here?

Mentimeter

Improving the code

import pandas as pd


# Constants for sampling rate and spike threshold
SAMPLING_RATE = 500  # Hz
SPIKE_THRESHOLD = 0.75  # Arbitrary units

# Load the dataset
# Avoid hardcoded paths and non-meaningful filenames; use a descriptive path.
data_path = input()
data = pd.read_csv(data_path)

# Refactored spike detection function with clear logic and meaningful names
def detect_spikes(data, sampling_rate, threshold):
    spike_times = []
    for i, value in enumerate(data.iloc[:, 0]):
        # Detect spikes based on threshold and other criteria
        if value > threshold and i % 3 == 0:
            spike_times.append(value)

    # Return the list of detected spikes, sorted in descending order
    return sorted(spike_times, reverse=True)

# Detect spikes using the provided sampling rate and threshold
spikes = detect_spikes(data, SAMPLING_RATE, SPIKE_THRESHOLD)

Organize your functions

  • small
  • do one thing
  • use descriptive names
  • limit amount of arguments

Make your code modular

my_pipeline/
└── preprocess.py
└── analysis.py
└── plot.py
└── read_write.py
└── my_analysis_pipeline.py

An example pipeline

Import your functions and classes in your main analysis pipeline

# file: my_analysis_pipeline.py
from my_project.read_write import load_calcium_data
from my_project.preprocess import extract_fluorescence
from my_project.analysis import analyze_calcium_timeseries
from my_project.plot import make_figures_for_paper

def my_calcium_imaging_pipeline():
    raw_data = load_calcium_data()
    delta_fluorescence = extract_fluorescence(raw_data)
    spike_times_table = analyze_calcium_timeseries(delta_fluorescence)
    make_figures_for_paper(spike_times_table)

    print("Completed")
  
if __name__ == "__main__":
    my_calcium_imaging_pipeline()

To learn more…

Suggested resources:

Next steps

Now that your logic is split into modules and your pipeline is well crafted you might want to move to the next step…

Package your project πŸ“¦

How can I call my functions from a different project?

my_pipeline/
└── preprocess.py
└── analysis.py
└── plot.py
└── read_write.py
└── my_analysis_pipeline.py

How can I call my functions from a different project?

Mentimeter

Demo! πŸ₯Ί

Demo! 😊

What does packaging mean?

Packaging allows Python to β€œinstall” your code so that it can be re-used from anywhere.
Install from your local git repo:

pip install .

Demo (editable, for development)! 😊

Packaging for local development

Local, editable installation from local github repo:

pip install -e .

The Python package structure

my-awesome-package/
    β”œβ”€β”€ LICENSE
    β”œβ”€β”€ README.md
    β”œβ”€β”€ pyproject.toml πŸ‘ˆ
    β”œβ”€β”€ ...
    β”œβ”€β”€ my_awesome_package/
    β”‚   └── __init__.py πŸ‘ˆ
    β”‚   └── do_something_awesome.py

Automating package creation

Photo by Neven Krcmarek on Unsplash

Use the NIU cookiecutter template.

conda create -n "package-playground"
conda activate package-playground
pip install cookiecutter
cookiecutter https://github.com/neuroinformatics-unit/python-cookiecutter 

Demo! πŸ“œ

Share your project 🌍

The Python Package Index (PyPi)

This structure will also come in handy when you want to distribute your package widely.

Your package is installable by anyone

pip install my_awesome_package

Summary

└── my-awesome-package/
    β”œβ”€β”€ LICENSE
    β”œβ”€β”€ MANIFEST.in
    β”œβ”€β”€ README.md
    β”œβ”€β”€ pyproject.toml
    β”œβ”€β”€ tox.ini
    β”œβ”€β”€ docs/
    β”œβ”€β”€ my_awesome_package/
    β”‚   └── __init__.py
    β”‚   └── do_something_awesome.py
    └── tests/
        β”œβ”€β”€ __init__.py
        └── test_placeholder.py

Conclusion

What we have learned today

  • Clean up and organize your scripts 🧹
  • Package your project πŸ“¦
  • Share your project 🌍

General note about good practices

  • Any of these steps are useful, at any time.
  • Perfect is the enemy of good.
  • Improve incrementally

Other areas of good practice

  • testing
  • writing documentation
  • asynchronous collaboration (issues and pull requests)

Extra resources

If you want to practice refactoring, testing and documenting we have prepared some exercises

Retrospective

  • Anonymously tell us what you thought on this ideaboardz…
  • … and get in touch anytime!