Hi all,

My boss has given me a task to do the following:

- Take fasta files, parse them using biopython
- Take the corresponding sequences and take the part of each sequence that is between two restriction sites (KpnI and BamHI).
- Then, I am to plot the sequences together on one plot, sorted by length, and highlight the bases that correspond

to a certain kind of amino acid sequence that these code for.

The end result is supposed to look like a pretty version of: (the parts in brackets are supposed to be the highlighted bases)

Seq 1 | ATCGGATC .... [ATCG .. ] ...

Seq 2 | ACCATC ... [ some more highlighted bases, not necessarily in the same position, or with the same length] ...

..

Seq p | Some more bases.

My boss would like this to be put together with python, preferably with matplotlib. I am a lowly statistician by training, and could probably crack something like this off in R, but am not as familiar with matplotlib.

By trying to look at some examples, I imagine I could try something like this plot,

http://matplotlib.org/examples/lines_bars_and_markers/marker_fillstyle_reference.html

but I'm unsure how to get started. Is there anyone that has come across a similar problem?

a. If your boss has given you a task, you should try something yourself before asking for help.

b. If you're asking for help, you should give us a lot more details, including the steps you've taken to solve the problem, so we know you're not taking a shortcut.

hahaha, sorry I hit enter too quickly.