Question: IGV tracks gene annotation
0
gravatar for bitpir
11 months ago by
bitpir130
bitpir130 wrote:

Hi there, I'm trying to visualize a reference genome in IGV and annotate the genes using a custom made GFF file. The snapshot of the IGV looks something like this. I am particularly curious about the pink labeled track. The GFF for the pink track looks something like this:

NC_023010.2 glimmer cds 5011 3686 3.08 - 1 orf00005;

NC_023010.2 glimmer cds 5052 5264 0.82 + 2 orf00006;

NC_023010.2 glimmer cds 5637 6800 3.11 + 2 orf00007;

As you can see, orf00005 is labeled differently. Does anyone know why this is so? Is that partial gene?

Screen Shot 2018 05 04 at 4 27 59 PM

Thanks for the help!

igv gff gene • 612 views
ADD COMMENTlink modified 11 months ago by h.mon24k • written 11 months ago by bitpir130
1
gravatar for h.mon
11 months ago by
h.mon24k
Brazil
h.mon24k wrote:

Probably IGV doesn't like the Glimmer gff, as it does not conform to the specification:

Columns 4 & 5: "start" and "end"

[...] Start is always less than or equal to end.

For orf00005, start > end.

ADD COMMENTlink written 11 months ago by h.mon24k

Hmm, I don't think that's the problem because there are other orfs that go in reverse direction too (HSP_RS15385). I found this answer from another site (https://biology.stackexchange.com/questions/68431/clarification-on-refseq-genes-track-on-igv) The thinner line is supposed to be untranslated region. Now I have to figure out why it is so while other tracks are considered translated region.

ADD REPLYlink written 11 months ago by bitpir130
1

Orfs that "go in reverse direction" has nothing to do with the start and end coordinates, this is an indication of strand:

Column 7: "strand"

The strand of the feature. + for positive strand (relative to the landmark), - for minus strand, and . for features that are not stranded. In addition, ? can be used for features whose strandedness is relevant, but unknown.

The feature you indicated (HISP_RS15385) is on minus strand, as orf00005, hence both have left-facing arrows. However, if you look at the gff, its start coordinate is less than the end coordinate ( 36612 < 37661 ):

NC_023010.2 RefSeq  gene    36612   37661   .   -   .   ID=gene-HISP_RS15385;Dbxref=GeneID:23802828;Name=HISP_RS15385;gbkey=Gene;gene_biotype=protein_coding;locus_tag=HISP_RS15385;old_locus_tag=HISP_16005
NC_023010.2 Protein Homology    CDS 36612   37661   .   -   0ID=cds35;Parent=gene-HISP_RS15385;Dbxref=Genbank:WP_014030602.1,GeneID:23802828;Name=WP_014030602.1;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_014030602.1;product=radical SAM protein;protein_id=WP_014030602.1;transl_table=11
ADD REPLYlink modified 11 months ago • written 11 months ago by h.mon24k

I see! Got it, thank you so much for pointing out. That's really weird that Glimmer GFF has that kind of format. I'll check again with the formatting with that file. Thanks!

ADD REPLYlink written 11 months ago by bitpir130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1169 users visited in the last hour