Program To Visualize Gene Models With Highlighted Protein Features
5
5
Entering edit mode
9.9 years ago
Christian ★ 3.0k

I have the following input:

(1) List of protein-coding gene models in GFF3 format (2) List of protein features of these genes (e.g. protein domains or transmembrane regions), with coordinates on the protein sequence-level.

As output, I would like to have an image that shows all gene models drawn to scale side-by-side, with protein features mapped onto coding exons. Different feature types should be drawn in different colors (configurable), and features are allowed to span exon-exon junctions. Ideally, the generation of such an image can be completely automated.

Example image:

Can anyone suggest a program that can do this?

visualization gene gene • 6.2k views
5
Entering edit mode
9.5 years ago
Christian ★ 3.0k

I ended up implementing two new BioPerl modules myself that, when used in combination, solve this problem. I just uploaded both modules to GitHub:

Quoting from the module descriptions:

Bio::Graphics::Glyph::decorated_gene - A GFF3-compatible gene glyph with protein decorations.

This glyph extends the functionality of the Bio::Graphics::Glyph::gene glyph and allows to draw protein decorations (e.g., signal peptides, transmembrane domains, protein domains) on top of gene models. Currently, the glyph can draw decorations in form of colored or outlined boxes inside or around CDS segments. Protein decorations are specified at the 'mRNA' transcript level in protein coordinates. Protein coordinates are automatically mapped to nucleotide coordinates by the glyph. Decorations are allowed to span exon-exon junctions, in which case decorations are split between exons. By default, the glyph automatically assigns different colors to different types of protein decorations, whereas decorations of the same type are always assigned the same color.

and

Bio::Draw::FeatureStack - BioPerl module to generate GD images of stacked gene models

FeatureStack creates GD images of vertically stacked gene models to facilitate visual comparison of gene structures. Compared genes can be clusters of orthologous genes, gene family members, or any other genes of interest. FeatureStack takes an array of BioPerl feature objects as input, projects them onto a common coordinate space, flips features on the negative strand (optional), left-aligns them by start coordinates (optional), sets a fixed intron size (optional), removes unwanted transcripts/isoforms (optional), and then draws the so transformed features with a user-specified glyph. Output images can be generated in SVG (scalable vectorized image) or PNG (rastered image) format.

Here is an example output of FeatureStack:

1
Entering edit mode

FeatureStack and decorated_gene are now available from CPAN as well: http://search.cpan.org/~chrisfr/

1
Entering edit mode

A paper describing this module is now published in Bioinformatics: http://bioinformatics.oxfordjournals.org/content/early/2012/09/27/bioinformatics.bts572.short

5
Entering edit mode
9.9 years ago
ALchEmiXt ★ 1.9k

You might consider using Artemis for that (developed by Sanger). It allows to read all kind of formats and feature files including gff, Genbank, EMBL, BAM, and such... It's JAVA based. There is also a multiple sequence omparison version called ACT.

I think Artemis can at least show the multiple features as separate tracks, but it also has a one line merge option. So I guess you might be able to pull it off by displaying the proteins on top of the other features...

Example of gene builder in Artemis:

Example main Artemis window:

Images coming from the paper Carver et al 2008.

0
Entering edit mode

Great suggestion, +1. Artemis has come a long way. Just tried, it can indeed display features on top of gene models (example here http://bit.ly/wlc3d6), but it seems one has to provide nucleotide and not protein coordinates, correct?

Also, I was looking for a solution that can be completely automated (just edited my question to make that clear). With Artemis, if I wanted to compare many gene models, I would have to look them up individually and cannot compare them side-by-side. Please let me know if I am wrong on this.

0
Entering edit mode

@Christian. You seem to be able to control Artemis by the cmd line API but personally I have not done so yet. Maybe in the future. Inded for automation....Dropping Tim Carver an email might help. He is usually quite responsive on the Artemis mailing lists.

0
Entering edit mode

Just popped in mind (happens a lot lately... :-)); for comparisons sake you can basically do a mutli track comparison using Artemis Comparison Tool (ACT) also from Sanger.

4
Entering edit mode
9.9 years ago
Scott Cain ▴ 750

In an upcoming gbrowse release, we're getting ready to roll out new functionality that will allow transparent glyphs and I think you'd be able to do this, though it would be easier if you transform the protein coordinates to DNA coordinates. I don't have a release schedule, but we generally get them out pretty fast.

0
Entering edit mode

I am currently tinkering around with BioPerl with some success. I would love if Bio::Graphics could generate such annotated gene models out-of-the-box.

One question Scott: Do I really need transparent glyphs to do this? Is there no other way in Bio::Graphics to draw glyphs on top of each other?

0
Entering edit mode

No, you don't need transparent glyphs, it would just make it really easy. You could also write your own glyph, which isn't real hard.

3
Entering edit mode
9.9 years ago
ALchEmiXt ★ 1.9k

An alternative that just popped in my mind might be the use of genoplotR. Basically it allows you to "program" entirely your genomebased graphics... of course knowledge of R is required.

0
Entering edit mode

Definitely an interesting possibility. However, I think for my specific problem genoplotR is too much of a general-purpose tool. Of course, if anyone has written genoplotR code that does what I am looking for... greatly appreciated!

1
Entering edit mode
9.9 years ago

I use the AnnotationSketch tool that comes with the GenomeTools library for most of my gene annotation graphics. You can provide the tool with a style file that allows you to define colors, shapes, etc for different feature types, and also control how features collapse (i.e., which feature types have their own tracks and which feature types should be plotted on their "Parent" features).

AnnotationSketch can definitely do what you describe, although you would have to create a style file and make sure that feature relationships are defined properly in the GFF3 file. However, I've been very pleased with how responsive the GenomeTools mailing list is, so you shouldn't have any trouble getting the help you need.

1
Entering edit mode

No, I don't think there are Perl bindings, but they do have bindings for C, Ruby, Python, Lua...I have written C programs that use the C bindings, but most of the time I used the command line tool gt sketch that doesn't require anything other than a GFF3 file (and a style file if you don't like the default style).

0
Entering edit mode

Nice! Setting-up/designing graphs reqs programming in C or Python right? Pity no Perl support... or do I get its usage wrong?