Question: Gene(s) withTpm value(s) zero in kallisto though reads are mapped
0
gravatar for Ankita.narang86
13 months ago by
Ankita.narang860 wrote:

Hi,

I have paired end stranded rnaseq data (trueseq stranded dutp protocol), Intially i incorrectly used - - fr option instead of rf. However, even with correct rf option for this protocol, i noticed that expression of a gene i was looking at is zero (rna direction is 3' to 5'), though are enough reads to support mapping when i visualise it through igv and it gives certain tpm value with unstranded option as well. If there are many cases like this, then i may loose important candidates. Please guide, may be i am missing some important point.

P. S. I posted this on kallisto-sleuth user group as well but didn't have reply yet.

Thanks in advance!!

rna-seq alignment gene • 506 views
ADD COMMENTlink modified 13 months ago • written 13 months ago by Ankita.narang860

Did you check if the mapped reads are uniquely mapped?

ADD REPLYlink written 13 months ago by h.mon27k

I didn't check for this region but i m sure the region has more than 100 reads, all are multi-mapped is less probable.

ADD REPLYlink written 13 months ago by Ankita.narang860

Does it overlap with another gene on an opposite strand? Maybe the reads are mapped there, not to your gene. Also, you have your gene in the annotation file? Make sure it's not missing.

ADD REPLYlink written 13 months ago by marina.v.yurieva480

Hi,

Thanks a lot for sharing link on - Why you should use alignment-independent quantification for RNA-Seq. I compared transcript vs gene abundances (since I m working on novel organism, I am not very sure of accuracy of transcriptome model as well ), I used gene and transcript coordinates and compared results -

On transcript

    FR  RF  Unstranded
Sample1     0.29    0.29    0.57
Sample2     0.29    0.28    0.56
Sample3  0.32   0.32    0.60
Sample4  0.37   0.32    0.67

On gene

Sample1 0.08    0.49    0.57
Sample2 0.09    0.47    0.56
Sample3 0.12    0.49    0.60
Sample4 0.12    0.55    0.67

For stranded ones, results are very different when comparing transcripts and genes. However, with unstranded option results are quiet similar for both genes and transcripts. Data is from Trueseq library (dUTP protocol) and it is giving comparable reads for both rf and fr (using transcripts). Thanks a lot for helping out, I was struggling to understand this and article you shared is very comprehensive.

As per this article for quantification of novel isoforms alignment dependent methods are more accurate, I want to identify lncRNAs, so i will opt for Histat2 and cufflinks.

ADD REPLYlink modified 13 months ago by genomax71k • written 13 months ago by Ankita.narang860

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This comment belongs under @h.mon's answer.

ADD REPLYlink written 13 months ago by genomax71k

Hi,

Thanks a lot for sharing link on - Why you should use alignment-independent quantification for RNA-Seq. I compared transcript vs gene abundances (since I m working on novel organism, I am not very sure of accuracy of transcriptome model as well ), I used gene and transcript coordinates and compared results -

On transcript

FR  RF  Unstranded

Sample1 0.29 0.29 0.57 Sample2 0.29 0.28 0.56 Sample3 0.32 0.32 0.60 Sample4 0.37 0.32 0.67 On gene

Sample1 0.08 0.49 0.57 Sample2 0.09 0.47 0.56 Sample3 0.12 0.49 0.60 Sample4 0.12 0.55 0.67 For stranded ones, results are very different when comparing transcripts and genes. However, with unstranded option results are quiet similar for both genes and transcripts. Data is from Trueseq library (dUTP protocol) and it is giving comparable reads for both rf and fr (using transcripts). Thanks a lot for helping out, I was struggling to understand this and article you shared is very comprehensive.

As per this article for quantification of novel isoforms alignment dependent methods are more accurate, I want to identify lncRNAs, so i will opt for Histat2 and cufflinks.

ADD REPLYlink written 13 months ago by Ankita.narang860
0
gravatar for h.mon
13 months ago by
h.mon27k
Brazil
h.mon27k wrote:

If you are using kallisto correctly, you are using a transcriptome as reference. If this transcriptome is from a well-annotated species (read human and mouse), there are a lot of isoforms, meaning a lot of multi-mapping reads. See for example the post Why you should use alignment-independent quantification for RNA-Seq:

[...] to achieve accurate estimates for transcripts with less than 50% unique sequence (>86% of transcripts) [...]

So a good proportion of reads are multi-mappers, and due to kallisto EM algorithm, it is very possible a transcript with lots of mapped reads gets zero counts.

See this post for a similar question:

Big differences between mappings computed by Salmon and quantification

kallisto and Salmon have the same underlying logic, so the answer there applies to your question as well.

ADD COMMENTlink modified 13 months ago • written 13 months ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1651 users visited in the last hour