Biologists friendly programming

I learned to code (to really code) at an age of about 33. Before that I did data analysis in programs like Excel and Origin, sometimes I would write small Matlab scripts.

But somewhere in 2016 or so, a course in Python programming presented itself to me, and I took it. I spend 3 days learning about the difference between ints and strings, and how in Python you add strings like ints (foo + bar = foobar). Fantastic, but now what?

Luckily at that time I had some real issues that needed some coding skills, my work consisted of analysing a dataset of 60 microscopy slides on which every cell was assigned a signal strength for it’s nucleus, cytoplasma and membrane. Since every microscopy slide consisted of more than 300.000 cells, that was a lot of data. So much data that the program I used at the time (Origin) would give me frequent coffee breaks of 15 minutes or more as it drew the graphs I needed to draw some conclusions.

Time to code! But how? For some reason I can’t remember, I ran into Jupyter Lab (which was a bit different back then, now it’s even more awesome).

I exported my data to a .tsv file and typed this into my first cell:

import pandas as pd
data = pd.read_csv('my_data.tsv', sep='t')
data

and hit enter. Glorious, there was my data, it showed the table (the first 10 lines, and the first 10 and last 10 columns) in my notebook.

I could address the individual columns of the data like this:

data['column_name']

and make histograms of all columns like this:

data.hist()

Pretty soon I discovered Seaborn and within months I was looping over huge datasets, melting DataFrames and creating the most beautiful plots of categorical data, literally with just a handful of lines of code.

I was sold, I was now a creature that turns coffee into code. I always looked at programming as a form of magic, but it’s just like learning French. Just spend time with the French, speak French, get insulted for doing it poorly, and try it again. It’s so satisfying and effective that soon it will become addictive.

In the early days I’d often export data to Excel, like so:

data.to_excel('my_data.xlsx')

do some things I could only do in Excel, and read it back in, like so:

data = pd.read_excel('my_data.xlsx')

There is no shame in doing that. But by now, Jupyter, Pandas and Seaborn are my preferred tools for analyzing data and I am often frustrated when I need to drop to Excel for anything.