Getting started with Python for biology

Python has become a dominant programming language for asking and answering many different questions in biology. This is in part because of its relative simplicity yet powerful flexibility, but also because Python has extensive library support for various biological tools.

Getting started

There are many different ways to run Python, but one of the easiest choices to get started is Google Colabroratory. This is essentially an online Jupyter Notebook, which has many popular packages pre-installed, and removes much of the hassle that can come from a local intallation of python.

Jupyter Notebooks are a great tool for keeping track of what the user has tried, and also seemlessly integrating spaces where the user can describe their thought process or other notes. It is conceptually similar to a lab notebook that one might use in a wet lab.

If a local installation of Python is prefered, Anaconda is a great choice. Anaconda is a data science platform that installs many useful tools, including Jupyter Notebook, and allows for easier package management. Packages contain libraries of useful code that other people have written making Python easier to use. Biopython is one such such library, and contains indespensible tools such as FASTA/GBK parsers/writers, and manipulating DNA/protein sequences.

Running Jupyter

Installing Packages

Once you feel comfortable with Jupyter Notebook, we need to install some missing packages. This can be accomplished by using `pip` -- this is a python package installer. In an empty cell, run this code: `!pip3 install biopython`. This should only have to be run once, and will persist across sessions.

Additional test data and an example of breseq output:

FASTQ reads for E. coli strain REL8593A (200M)
Reference genome (REL606)
Example of *breseq* output

FASTQ reads for E. coli strain REL8593A (200M)

Reference genome (REL606)

Example of breseq output

Barrick Lab > ToolList > BioPythonOnboard

Contributors to this topic

MattMcGuffie, JeffreyBarrick

Topic revision: r2 - 2020-04-09 - 19:38:07 - Main.MattMcGuffie