Question: Difference in FPKM values of lncRNA when using different annotation files
0
gravatar for piyushjo
5 weeks ago by
piyushjo40
piyushjo40 wrote:

Hi,

I was using gencode mouse annotation file vM17 for quantifying genes for RNA-seq. I am also interested in lncRNA quantification. I used two different annotation files. First to quantify all transcripts I used the comprehensive annotation file (primary assembly). Then to quantify just the lncRNA, I used the gencode mouse lncRNA annotation file. Now I know that the comprehensive files should have all the lncRNA, so I compared the FPKM values calculated from comprehensive and lncRNA annotation files.

What I observe is that the FPKM values are different. The trend is same, so for example in three condition if using comprehensive annotation file I get following values :A= 2, B=4, C=8; then when I use lncRNA annotation file I get A=6, B=11 , C=23 (example for representation purpose only). I just wanted to ask opinion of experts if I should use FPKM values from lncRNA annotation or the comprehensive file.

I am assuming that in the lncRNA notation, when the reads fall in a region that might overlap with mRNA, it is counted towards lncRNA; as there is no mRNA annotation. However, in case of comprehensive annotation; the read is decided based on where the overlap is more prominent. This is just my thinking.

Please guide me understand what should be my choice: comprehensive or lncrna?

Thanks!!

ADD COMMENTlink modified 5 weeks ago by grant.hovhannisyan1.1k • written 5 weeks ago by piyushjo40
2
gravatar for grant.hovhannisyan
5 weeks ago by
grant.hovhannisyan1.1k wrote:

IMHO, when you use only lncRNA annotations, your library size (total number of mapped reads overlapping features) is always less than when you use a comprehensive annotation. Thus, you always get higher FPKM values when you use only lncRNA annotations. There is a recent bioarxiv paper addressing this exact issue https://www.biorxiv.org/content/early/2018/01/09/241869, might be helpful for you (not peer-reviewed though). The authors claim that pseudoalignment software like salmon/kallisot alongside with full genome annotations have advantages over other combinations of methods.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by grant.hovhannisyan1.1k
2

To add to what Grant has said, which is perfectly valid, I have to state that FPKM should not be used anymore. There is a very well cited manuscript ( A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis ) that states the following:

An update (12th August 2018):

The Total Count and RPKM normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

...FPKM is essentially the same as RPKM.

My recommendation is to use the comprehensive annotation and to then filter in/out certain gene biotypes from the raw counts that you generate over this comprehensive annotation.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Kevin Blighe28k

Thanks Kevin and Grant.

I am using stringtie to convert bams to gtf (that contain the fpkm and tpm values). There is also a python script by the group that can convert it into reads. But I guess that the algorithm converts fpkm into reads. Do you have experience with that? Just want to make sure that the algorithm doesn't suffer from the same bias.

ADD REPLYlink written 5 weeks ago by piyushjo40
2

You have (at least) two trustworthy and reliable ways to convert your bam files to read counts:

  1. Use featurecounts - the most straightforward way, basically will count number of reads overlapping features in gff/gtf file. You will generate gene-level quantifications.
  2. If you have used stringtie to generate fpkm/tmp values (stringtie makes transcript level quantifications), then you can use tximport to convert TPMs to read counts. If you are planning to do dif. gene expression analysis, the second option is more advisable according to https://f1000research.com/articles/4-1521/v1
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by grant.hovhannisyan1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 903 users visited in the last hour