My boss has given me a task to do the following:
- Take fasta files, parse them using biopython
- Take the corresponding sequences and take the part of each sequence that is between two restriction sites (KpnI and BamHI).
- Then, I am to plot the sequences together on one plot, sorted by length, and highlight the bases that correspond
to a certain kind of amino acid sequence that these code for.
The end result is supposed to look like a pretty version of: (the parts in brackets are supposed to be the highlighted bases)
Seq 1 | ATCGGATC .... [ATCG .. ] ...
Seq 2 | ACCATC ... [ some more highlighted bases, not necessarily in the same position, or with the same length] ...
Seq p | Some more bases.
My boss would like this to be put together with python, preferably with matplotlib. I am a lowly statistician by training, and could probably crack something like this off in R, but am not as familiar with matplotlib.
By trying to look at some examples, I imagine I could try something like this plot,
but I'm unsure how to get started. Is there anyone that has come across a similar problem?