Question: How has StringTie caculated the transcript coverage?
3
gravatar for ddzhangzz
16 months ago by
ddzhangzz90
United States
ddzhangzz90 wrote:

Recently I have used Stringtie to compute the reads of RNASeq mapping to transcripts. There are two transcripts of a gene with exactly same length and number of exons (as well as the assembly structure of the two transcripts) and I found the coverages were very different from each other.

 ##transcript
t_id    chr     strand  start   end     t_name             _exons       length  gene_id              ene_name   cov             FPKM
77237   chr17   -       7668402 7687538 ENST00000269305.7       11      2579    ENSG00000141510.14      TP53    31.946598       5.549151
77238   chr17   -       7668402 7687538 ENST00000620739.3       11      2579    ENSG00000141510.14      TP53    2.961419        0.514401

I am wondering how the stringtie has calculated the coverage. By its definition and if my understand were correct, the coverage was calculated as \sum{seq_i*mapped-seq-length_i}{i=1}{m}/transcript_length. If this is true, I expect the coverage should be similar of these two transcripts but why they were so different.

rna-seq • 1.2k views
ADD COMMENTlink modified 16 months ago by geo.pertea70 • written 16 months ago by ddzhangzz90

Did you find the solution anywhere else? we are struggling to find out the same. It is not clear anywhere.

ADD REPLYlink written 16 months ago by Vijay Lakhujani4.1k

you may follow up with this post on github. may be someone is listening

https://github.com/gpertea/stringtie/issues/162

ADD REPLYlink written 16 months ago by Vijay Lakhujani4.1k
0
gravatar for geo.pertea
16 months ago by
geo.pertea70
geo.pertea70 wrote:

Please see this answer about how coverage values are calculated by StringTie. Transcript and exon coverage values for overlapping transcripts (alternate isoforms) are calculated after distributing the read alignments according to the maximum flow algorithm -- it's not as simple as applying a formula.

For this particular question, without further data I presume that ENST00000269305.7 and ENST00000620739.3 are somehow distinct isoforms (so not exactly identical in their intron-exon structure, otherwise one of them would be discarded when the input file is loaded).

ADD COMMENTlink written 16 months ago by geo.pertea70

ENST00000269305.7 and ENST00000620739.3 are truely identical in exons assembly (even they are assigned to different Ensembl IDs) (probably due to they have differently AA seq). These cases also seem not rare and we found at least "5" duplicated transcripts in one gene. My question was to understand how Stringtie treated them. Comparing to Salmon, it has removed one of these duplicated isoforms but I still wanted to know the details in Stringtie.

ADD REPLYlink modified 16 months ago • written 16 months ago by ddzhangzz90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1949 users visited in the last hour