Question: same transcript id with STAR quant mode
0
gravatar for grant.hovhannisyan
16 months ago by
grant.hovhannisyan1.4k wrote:

Hi Biostars,

My gtf file (which I got by converting gff file using gffread) has this kind of information:

ctro_c_1    CGOB    exon    27749    28666    .    +    .    transcript_id "ctro_CGOB_00001_mRNA"; gene_id "ctro_CGOB_00001"; gene_name "ctro_CGOB_00001";
ctro_c_1    CGOB    CDS    27749    28666    .    +    0    transcript_id "ctro_CGOB_00001_mRNA"; gene_id "ctro_CGOB_00001"; gene_name "ctro_CGOB_00001";
ctro_c_1    CGOB    exon    770839    771455    .    -    .    transcript_id "ctro_CGOB_00002_mRNA"; gene_id "ctro_CGOB_00002"; gene_name "ctro_CGOB_00002";
ctro_c_1    CGOB    exon    771521    771554    .    -    .    transcript_id "ctro_CGOB_00002_mRNA"; gene_id "ctro_CGOB_00002"; gene_name "ctro_CGOB_00002";
ctro_c_1    CGOB    CDS    770839    771455    .    -    2    transcript_id "ctro_CGOB_00002_mRNA"; gene_id "ctro_CGOB_00002"; gene_name "ctro_CGOB_00002";
ctro_c_1    CGOB    CDS    771521    771554    .    -    0    transcript_id "ctro_CGOB_00002_mRNA"; gene_id "ctro_CGOB_00002"; gene_name "ctro_CGOB_00002";

ctro_CGOB_00002 has two exons but both have the same transcript_id ctro_CGOB_00002_mRNA. If I will use --quantMode TranscriptomeSAM GeneCounts option with STAR, it will sum up counts from both exons, right?

Thank you very much

rna-seq star gtf • 634 views
ADD COMMENTlink modified 8 weeks ago by manuel.belmadani580 • written 16 months ago by grant.hovhannisyan1.4k
0
gravatar for manuel.belmadani
8 weeks ago by
Canada
manuel.belmadani580 wrote:

Yes that's right.

From the author of STAR

Read counting (e.g. htseq-count, featureCounts or STAR --quantMode GeneCounts) simply counts the number of uniquely mapped reads that overlap exons of each gene.

In the same thread Lior Pachter also mentions an important caveat with gene counts:

The main problem with htseq or featurecounts is that reads are not disambiguated between isoforms of genes, and when these isoforms have different lengths, the naïve counting methods can be very inaccurate. This is not an alignment issue but a quantification issue. In other words, simple counting is wrong because the total gene "counts" obtained by aggregating all reads that map to a gene locus is not, in general, going to be proportional to the gene abundance.

I would recommend looking at RSEM, which is a pretty popular quantifier, and it provides an "expected count", which I believe normalizes for the portion of the gene mapped, and also provides FPKM, TMP in addition of counts. (See this thread for more info on expected count v.s. raw count.) It supports STAR directly too.

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by manuel.belmadani580
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1105 users visited in the last hour