Beyond your script

An introduction to collaborative coding and software engineering good practice

Alessandro Felder, Laura Porta, Chang Huan Lo

Get ready!

What makes (research) code good?

Mentimeter

The next steps

Photo by Jukan Tateisi on Unsplash
import pandas
import numpy

def do_something_awesome(input):
  p = process(input)
  a = analyze(p)
  g = plot(a)
  output = g, a
  return output
  
if __name__ == "__main__":
  print(do_something_awesome(42))

The next steps

Photo by Jukan Tateisi on Unsplash
pip install my_awesome_package

The journey of a developer and its Python package

  • Clean up and organize your scripts 🧹
  • Package your project πŸ“¦
  • Share your project 🌍

Clean up and organize your scripts 🧹

A common scenario

Image by PhD comics

A common scenario

Image by PhD comics

A common scenario

Image by PhD comics

The messy reality

You’ve written your analysis script, and it β€œworks”… but it’s chaotic:

  • Hardcoded numbers, file paths, and cryptic names are everywhere
  • Analysis logic is scattered and hard to follow
  • Plotting functions are tangled with data processing

Don’t move on to the next project! Your work isn’t done yet.

What might be not-so-good about these two functions?

def calculate_fastest_time(time_list):
    fastest_time = time_list[0]
    for time in time_list:
        if time > fastest_time:
            fastest_time = time
    return fastest_time


def print_fastest_time(time_list):
    fastest_time = time_list[0]
    for time in time_list:
        if time > fastest_time:
            fastest_time = time

    print(f"This is the fastest time: {fastest_time} πŸš€")

What might be not-so-good about these two functions?

Mentimeter

Code smells

A code smell is a surface indication that usually corresponds to a deeper problem in the system.

  • Martin Fowler

Which code smells can we find here?

my_project/
└── do_all_analysis.py

a = 500 
b = 0.75 

# from michael old analysis pipeline
def process_data(T, sr, t):
    s = [x * t for i, x in enumerate(T.iloc[:, 0]) if (x % t) > sr and i % 3 == 0]
    return sorted(s, reverse=True)[:len(s)//2] if len(s) > 5 else s

import pandas as pd
data = pd.read_csv("/Users/helen/Desktop/29104629.csv")

spikes = process_data(data, a, b)

Which code smells can we find here?

Mentimeter

Improving the code

import pandas as pd


# Constants for sampling rate and spike threshold
SAMPLING_RATE = 500  # Hz
SPIKE_THRESHOLD = 0.75  # Arbitrary units

# Load the dataset
# Avoid hardcoded paths and non-meaningful filenames; use a descriptive path.
data_path = input()
data = pd.read_csv(data_path)

# Refactored spike detection function with clear logic and meaningful names
def detect_spikes(data, sampling_rate, threshold):
    spike_times = []
    for i, value in enumerate(data.iloc[:, 0]):
        # Detect spikes based on threshold and other criteria
        if value > threshold and i % 3 == 0:
            spike_times.append(value)

    # Return the list of detected spikes, sorted in descending order
    return sorted(spike_times, reverse=True)

# Detect spikes using the provided sampling rate and threshold
spikes = detect_spikes(data, SAMPLING_RATE, SPIKE_THRESHOLD)

Organize your functions

  • small
  • do one thing
  • use descriptive names
  • limit amount of arguments

Make your code modular

my_pipeline/
└── preprocess.py
└── analysis.py
└── plot.py
└── read_write.py
└── my_analysis_pipeline.py

An example pipeline

Import your functions and classes in your main analysis pipeline

# file: my_analysis_pipeline.py
from my_project.read_write import load_calcium_data
from my_project.preprocess import extract_fluorescence
from my_project.analysis import analyze_calcium_timeseries
from my_project.plot import make_figures_for_paper

def my_calcium_imaging_pipeline():
    raw_data = load_calcium_data()
    delta_fluorescence = extract_fluorescence(raw_data)
    spike_times_table = analyze_calcium_timeseries(delta_fluorescence)
    make_figures_for_paper(spike_times_table)

    print("Completed")
  
if __name__ == "__main__":
    my_calcium_imaging_pipeline()

To learn more…

Suggested resources:
- Clean Code by Robert C. Martin

Next steps

Now that your logic is split into modules and your pipeline is well crafted you might want to move to the next step…

Package your project πŸ“¦

How can I call my functions from a different project?

my_pipeline/
└── preprocess.py
└── analysis.py
└── plot.py
└── read_write.py
└── my_analysis_pipeline.py

How can I call my functions from a different project?

Mentimeter

Demo! πŸ₯Ί

What does packaging mean?

Packaging allows Python to β€œinstall” your code so that it can be re-used from anywhere.
Local, editable installation from the package directory:

pip install -e .

Demo! 😊

The Python package structure

my-awesome-package/
    β”œβ”€β”€ LICENSE
    β”œβ”€β”€ README.md
    β”œβ”€β”€ pyproject.toml πŸ‘ˆ
    β”œβ”€β”€ ...
    β”œβ”€β”€ my_awesome_package/
    β”‚   └── __init__.py πŸ‘ˆ
    β”‚   └── do_something_awesome.py

Automating package creation

Photo by Neven Krcmarek on Unsplash

Use the NIU cookiecutter template.

conda create -n "package-playground"
conda activate package-playground
pip install cookiecutter
cookiecutter https://github.com/neuroinformatics-unit/python-cookiecutter 

Demo! πŸ“œ

Share your project 🌍

The Python Package Index (PyPi)

This structure will also come in handy when you want to distribute your package widely.

Your package is installable by anyone

pip install my_awesome_package

Summary

└── my-awesome-package/
    β”œβ”€β”€ LICENSE
    β”œβ”€β”€ MANIFEST.in
    β”œβ”€β”€ README.md
    β”œβ”€β”€ pyproject.toml
    β”œβ”€β”€ tox.ini
    β”œβ”€β”€ docs/
    β”œβ”€β”€ my_awesome_package/
    β”‚   └── __init__.py
    β”‚   └── do_something_awesome.py
    └── tests/
        β”œβ”€β”€ __init__.py
        └── test_placeholder.py

Conclusion

What we have learned today

  • Clean up and organize your scripts 🧹
  • Package your project πŸ“¦
  • Share your project 🌍

Other areas of good practice

  • testing
  • writing documentation
  • asynchronous collaboration (issues and pull requests)

Extra resources

If you want to practice refactoring, testing and documenting we have prepared some exercises

Retrospective

  • Anonymously tell us what you thought on this ideaboardz…
  • … and get in touch anytime!