Question

Gene(s) withTpm value(s) zero in kallisto though reads are mapped

0

Entering edit mode

5.7 years ago

Ankita.narang86 • 0

Hi,

I have paired end stranded rnaseq data (trueseq stranded dutp protocol), Intially i incorrectly used - - fr option instead of rf. However, even with correct rf option for this protocol, i noticed that expression of a gene i was looking at is zero (rna direction is 3' to 5'), though are enough reads to support mapping when i visualise it through igv and it gives certain tpm value with unstranded option as well. If there are many cases like this, then i may loose important candidates. Please guide, may be i am missing some important point.

P. S. I posted this on kallisto-sleuth user group as well but didn't have reply yet.

Thanks in advance!!

RNA-Seq alignment gene • 2.8k views

ADD COMMENT • link 5.7 years ago by Ankita.narang86 • 0

0

Entering edit mode

Did you check if the mapped reads are uniquely mapped?

ADD REPLY • link 5.7 years ago by h.mon 35k

0

Entering edit mode

I didn't check for this region but i m sure the region has more than 100 reads, all are multi-mapped is less probable.

ADD REPLY • link 5.7 years ago by Ankita.narang86 • 0

0

Entering edit mode

Does it overlap with another gene on an opposite strand? Maybe the reads are mapped there, not to your gene. Also, you have your gene in the annotation file? Make sure it's not missing.

ADD REPLY • link 5.7 years ago by marina.v.yurieva ▴ 570

0

Entering edit mode

Hi,

Thanks a lot for sharing link on - Why you should use alignment-independent quantification for RNA-Seq. I compared transcript vs gene abundances (since I m working on novel organism, I am not very sure of accuracy of transcriptome model as well ), I used gene and transcript coordinates and compared results -

On transcript

    FR  RF  Unstranded
Sample1     0.29    0.29    0.57
Sample2     0.29    0.28    0.56
Sample3  0.32   0.32    0.60
Sample4  0.37   0.32    0.67

On gene

Sample1 0.08    0.49    0.57
Sample2 0.09    0.47    0.56
Sample3 0.12    0.49    0.60
Sample4 0.12    0.55    0.67

For stranded ones, results are very different when comparing transcripts and genes. However, with unstranded option results are quiet similar for both genes and transcripts. Data is from Trueseq library (dUTP protocol) and it is giving comparable reads for both rf and fr (using transcripts). Thanks a lot for helping out, I was struggling to understand this and article you shared is very comprehensive.

As per this article for quantification of novel isoforms alignment dependent methods are more accurate, I want to identify lncRNAs, so i will opt for Histat2 and cufflinks.

ADD REPLY • link updated 5.7 years ago by GenoMax 141k • written 5.7 years ago by Ankita.narang86 • 0

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This comment belongs under @h.mon's answer.

ADD REPLY • link 5.7 years ago by GenoMax 141k

0

Entering edit mode

Hi,

Thanks a lot for sharing link on - Why you should use alignment-independent quantification for RNA-Seq. I compared transcript vs gene abundances (since I m working on novel organism, I am not very sure of accuracy of transcriptome model as well ), I used gene and transcript coordinates and compared results -

On transcript

FR  RF  Unstranded

Sample1 0.29 0.29 0.57 Sample2 0.29 0.28 0.56 Sample3 0.32 0.32 0.60 Sample4 0.37 0.32 0.67 On gene

Sample1 0.08 0.49 0.57 Sample2 0.09 0.47 0.56 Sample3 0.12 0.49 0.60 Sample4 0.12 0.55 0.67 For stranded ones, results are very different when comparing transcripts and genes. However, with unstranded option results are quiet similar for both genes and transcripts. Data is from Trueseq library (dUTP protocol) and it is giving comparable reads for both rf and fr (using transcripts). Thanks a lot for helping out, I was struggling to understand this and article you shared is very comprehensive.

As per this article for quantification of novel isoforms alignment dependent methods are more accurate, I want to identify lncRNAs, so i will opt for Histat2 and cufflinks.

ADD REPLY • link 5.7 years ago by Ankita.narang86 • 0

score 0 · Answer 1 · 2018-08-14

If you are using kallisto correctly, you are using a transcriptome as reference. If this transcriptome is from a well-annotated species (read human and mouse), there are a lot of isoforms, meaning a lot of multi-mapping reads. See for example the post Why you should use alignment-independent quantification for RNA-Seq:

[...] to achieve accurate estimates for transcripts with less than 50% unique sequence (>86% of transcripts) [...]

So a good proportion of reads are multi-mappers, and due to kallisto EM algorithm, it is very possible a transcript with lots of mapped reads gets zero counts.

See this post for a similar question:

Big differences between mappings computed by Salmon and quantification

kallisto and Salmon have the same underlying logic, so the answer there applies to your question as well.