An introduction to collaborative coding and software engineering good practice
Youβve written your analysis script, and it βworksββ¦ but itβs chaotic:
Donβt move on to the next project! Your work isnβt done yet.
def calculate_fastest_time(time_list):
fastest_time = time_list[0]
for time in time_list:
if time > fastest_time:
fastest_time = time
return fastest_time
def print_fastest_time(time_list):
fastest_time = time_list[0]
for time in time_list:
if time > fastest_time:
fastest_time = time
print(f"This is the fastest time: {fastest_time} π")
A code smell is a surface indication that usually corresponds to a deeper problem in the system.
my_project/
βββ do_all_analysis.py
a = 500
b = 0.75
# from michael old analysis pipeline
def process_data(T, sr, t):
s = [x * t for i, x in enumerate(T.iloc[:, 0]) if (x % t) > sr and i % 3 == 0]
return sorted(s, reverse=True)[:len(s)//2] if len(s) > 5 else s
import pandas as pd
data = pd.read_csv("/Users/helen/Desktop/29104629.csv")
spikes = process_data(data, a, b)
import pandas as pd
# Constants for sampling rate and spike threshold
SAMPLING_RATE = 500 # Hz
SPIKE_THRESHOLD = 0.75 # Arbitrary units
# Load the dataset
# Avoid hardcoded paths and non-meaningful filenames; use a descriptive path.
data_path = input()
data = pd.read_csv(data_path)
# Refactored spike detection function with clear logic and meaningful names
def detect_spikes(data, sampling_rate, threshold):
spike_times = []
for i, value in enumerate(data.iloc[:, 0]):
# Detect spikes based on threshold and other criteria
if value > threshold and i % 3 == 0:
spike_times.append(value)
# Return the list of detected spikes, sorted in descending order
return sorted(spike_times, reverse=True)
# Detect spikes using the provided sampling rate and threshold
spikes = detect_spikes(data, SAMPLING_RATE, SPIKE_THRESHOLD)
my_pipeline/
βββ preprocess.py
βββ analysis.py
βββ plot.py
βββ read_write.py
βββ my_analysis_pipeline.py
Import your functions and classes in your main analysis pipeline
# file: my_analysis_pipeline.py
from my_project.read_write import load_calcium_data
from my_project.preprocess import extract_fluorescence
from my_project.analysis import analyze_calcium_timeseries
from my_project.plot import make_figures_for_paper
def my_calcium_imaging_pipeline():
raw_data = load_calcium_data()
delta_fluorescence = extract_fluorescence(raw_data)
spike_times_table = analyze_calcium_timeseries(delta_fluorescence)
make_figures_for_paper(spike_times_table)
print("Completed")
if __name__ == "__main__":
my_calcium_imaging_pipeline()
Suggested resources:
- Clean Code by Robert C. Martin
Now that your logic is split into modules and your pipeline is well crafted you might want to move to the next stepβ¦
my_pipeline/
βββ preprocess.py
βββ analysis.py
βββ plot.py
βββ read_write.py
βββ my_analysis_pipeline.py
my-awesome-package/
βββ LICENSE
βββ README.md
βββ pyproject.toml π
βββ ...
βββ my_awesome_package/
β βββ __init__.py π
β βββ do_something_awesome.py
This structure will also come in handy when you want to distribute your package widely.
βββ my-awesome-package/
βββ LICENSE
βββ MANIFEST.in
βββ README.md
βββ pyproject.toml
βββ tox.ini
βββ docs/
βββ my_awesome_package/
β βββ __init__.py
β βββ do_something_awesome.py
βββ tests/
βββ __init__.py
βββ test_placeholder.py
If you want to practice refactoring, testing and documenting we have prepared some exercises
Collaborative coding and software engineering good practice | 2024-10-02