Question: Calculate FPKM value from bam files
gravatar for EileenXu
5.3 years ago by
United States
EileenXu0 wrote:

I have some bam files of RNAseq data, and I want to calculate the FPKM values of each gene. However, I couldn't know how the bam files are generated. I tried to use cufflinks to do this, but the corresponding values are weird. Is there any suggestions to me? I'm sorry that I'm new to this field.

rna-seq software error • 8.8k views
ADD COMMENTlink modified 5.3 years ago by alolex910 • written 5.3 years ago by EileenXu0

First, why do you want to calculate FPKM if you are looking at gene level data ? Can you post the command that you used for cufflinks ? And why do you think the corresponding values are weird ?

ADD REPLYlink written 5.3 years ago by geek_y11k

Thanks for your reply.

The command I used for cufflinks is:

cufflinks -p 8 --library-type fr-firststrand -G ./hg19.gtf $file

I think the FPKM values I got are weird because there are some transcripts got values like 1.01111e-310. Besides, I used bedcoverage to look at the number of reads at each position, and saw some transcripts with nonzero number of reads got zero FPKM value, and some transcripts with zero number of reads got nonzero FPKM value.

My question now is: If I simply want to get the expression level of each gene, could I just calculate it myself using the definition of RPKM? Is cufflinks using some specific technique?

ADD REPLYlink written 5.3 years ago by EileenXu0
gravatar for seidel
5.3 years ago by
United States
seidel7.1k wrote:

Are you sure you can not determine how the BAM files were generated? One of the conventions of a BAM file is that it contains the command used to generate it. The command is stored in the file header, and can be seen with:

samtools view -H yourFile.bam

One of the last lines of the header should llok something like:

@PG ID:TopHat VN:2.0.13 CL:/n/local/bin/tophat --GTF SacCer3.gtf --library-type fr-firststrand


Occasionally some BAM files will not contain this information. You should confirm that your files do not have it.

Cufflinks should calculate FPKM values for you. But as Geek_y mentions, saying the values are "weird" is uninformative. You'll have to say exactly what you mean. You can try calculating some values by hand - you can use samtools to pull out reads covering a given gene (see samtools view), count them, and manually calculate the FPKM using the total number of mapped reads returned by samtools idxstats to see if things are making sense. You could also examine some of the read flags to see if the alignment was done using parameters you don't agree with (i.e. examine map quality, number of hits etc.).

Lastly, if you simply want to remap everything yourself, but all you have are the BAM files, you can convert them back to fastq, and map them to your liking:

bedtools bamtofastq -i input.bam -fq output.fq
ADD COMMENTlink modified 12 months ago by _r_am31k • written 5.3 years ago by seidel7.1k

Thank you very much for your information.

I really can not get the information of how the bam files were generated even using the command you referred to.

As I replied to Geek_y, I think the FPKM values are weird based on two facts: 1. there are some values as low as 1.01111e-310; 2. some zero and nonzero values are conflicted with the number of reads I got through bedcoverage (I'm confident with this value after compare them with the results I got with samtools).

Do you know the technique used in the cufflinks? Could I simply calculate the RPKM value based on its definition?

Thank you again for your reply.

ADD REPLYlink written 5.3 years ago by EileenXu0

Read the cufflinks paper and weep. It's very complicated. Who said it was a good idea to use?

ADD REPLYlink modified 12 months ago by _r_am31k • written 5.3 years ago by karl.stamm3.9k
gravatar for Brian Bushnell
5.3 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

If you map to the transcriptome, you can get RPKM values (as well as raw read counts) directly from BBMap or Seal.  This won't work for the genome, though, without an additional tool.

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Brian Bushnell17k

dear Brian can we get FPKM directly d for paired end reads from bbmap or bbmap just give us RPKM?

ADD REPLYlink written 3.0 years ago by rahmati.razieh8330
gravatar for TriS
5.3 years ago by
United States, Buffalo
TriS4.2k wrote:

you could try using RSEM which will also give you the TPM which it is considered to be more reliable/precise than RPKM

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by TriS4.2k
gravatar for alolex
5.3 years ago by
United States
alolex910 wrote:

I would not use cufflinks to do this.  First you need to generate your count table, which you can do using HTseq-count, then take a look at this post ( How To Calculate The Rpkm For The Count Tables Of Rna Seq Data ) to calculate the FPKM values yourself using the .gtf file you listed above.  One question though, if you don't know how the BAM file was created, then how do you know it was aligned to Hg19?  It might be helpful to post the BAM file header here so others can see what your looking at.


ADD COMMENTlink written 5.3 years ago by alolex910
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 841 users visited in the last hour