Question: Calculate FPKM value from bam files
0
gravatar for EileenXu
4.1 years ago by
EileenXu0
United States
EileenXu0 wrote:

I have some bam files of RNAseq data, and I want to calculate the FPKM values of each gene. However, I couldn't know how the bam files are generated. I tried to use cufflinks to do this, but the corresponding values are weird. Is there any suggestions to me? I'm sorry that I'm new to this field.

rna-seq software error • 7.2k views
ADD COMMENTlink modified 4.1 years ago by alolex900 • written 4.1 years ago by EileenXu0

First, why do you want to calculate FPKM if you are looking at gene level data ? Can you post the command that you used for cufflinks ? And why do you think the corresponding values are weird ?

ADD REPLYlink written 4.1 years ago by geek_y9.8k

Thanks for your reply.

The command I used for cufflinks is:

cufflinks -p 8 --library-type fr-firststrand -G ./hg19.gtf $file

I think the FPKM values I got are weird because there are some transcripts got values like 1.01111e-310. Besides, I used bedcoverage to look at the number of reads at each position, and saw some transcripts with nonzero number of reads got zero FPKM value, and some transcripts with zero number of reads got nonzero FPKM value.

My question now is: If I simply want to get the expression level of each gene, could I just calculate it myself using the definition of RPKM? Is cufflinks using some specific technique?

ADD REPLYlink written 4.1 years ago by EileenXu0
0
gravatar for seidel
4.1 years ago by
seidel6.8k
United States
seidel6.8k wrote:

Are you sure you can not determine how the BAM files were generated? One of the conventions of a BAM file is that it contains the command used to generate it. The command is stored in the file header, and can be seen with:

samtools view -H yourFile.bam

One of the last lines of the header should llok something like:

@PG ID:TopHat VN:2.0.13 CL:/n/local/bin/tophat --GTF SacCer3.gtf --library-type fr-firststrand ...etc.

Occasionally some BAM files will not contain this information. You should confirm that your files do not have it.

Cufflinks should calculate FPKM values for you. But as Geek_y mentions, saying the values are "wierd" is uninformative. You'll have to say exactly what you mean. You can try calculating some values by hand - you can use samtools to pull out reads covering a given gene (see samtools view), count them, and manually calculate the FPKM using the total number of mapped reads returned by samtools idxstats to see if things are making sense. You could also examine some of the read flags to see if the alignment was done using parameters you don't agree with (i.e. examine map quality, number of hits etc.).

Lastly, if you simply want to remap everything yourself, but all you have are the BAM files, you can convert them back to fastq, and map them to your liking:

bedtools bamtofastq -i input.bam -fq output.fq

 

ADD COMMENTlink written 4.1 years ago by seidel6.8k

Thank you very much for your information.

I really can not get the information of how the bam files were generated even using the command you referred to.

As I replied to Geek_y, I think the FPKM values are weird based on two facts: 1. there are some values as low as 1.01111e-310; 2. some zero and nonzero values are conflicted with the number of reads I got through bedcoverage (I'm confident with this value after compare them with the results I got with samtools).

Do you know the technique used in the cufflinks? Could I simply calculate the RPKM value based on its definition?

Thank you again for your reply.

ADD REPLYlink written 4.1 years ago by EileenXu0

Read the cufflinks paper and weep. It's very complicated. Who said it was a good idea to use?

 

ADD REPLYlink written 4.1 years ago by karl.stamm3.5k
0
gravatar for Brian Bushnell
4.1 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

If you map to the transcriptome, you can get RPKM values (as well as raw read counts) directly from BBMap or Seal.  This won't work for the genome, though, without an additional tool.

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Brian Bushnell16k

dear Brian can we get FPKM directly d for paired end reads from bbmap or bbmap just give us RPKM?

ADD REPLYlink written 21 months ago by rahmati.razieh8330
0
gravatar for TriS
4.1 years ago by
TriS3.9k
United States, Buffalo
TriS3.9k wrote:

you could try using RSEM which will also give you the TPM which it is considered to be more reliable/precise than RPKM

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by TriS3.9k
0
gravatar for alolex
4.1 years ago by
alolex900
United States
alolex900 wrote:

I would not use cufflinks to do this.  First you need to generate your count table, which you can do using HTseq-count, then take a look at this post ( How To Calculate The Rpkm For The Count Tables Of Rna Seq Data ) to calculate the FPKM values yourself using the .gtf file you listed above.  One question though, if you don't know how the BAM file was created, then how do you know it was aligned to Hg19?  It might be helpful to post the BAM file header here so others can see what your looking at.

 

ADD COMMENTlink written 4.1 years ago by alolex900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2627 users visited in the last hour