Question

Down sampling RNAseq bam file so that all genes have 1 FPKM

0

Entering edit mode

8.1 years ago

NotARobot • 0

Hi,

I have a bam file with alignment of reads to the transcripts generated by RSEMdeweylab.biostat.wisc.edu/rsem/) Different genes have different FPKM values.

Does anybody know a straight forward way to down sample the reads so that all genes have exactly 1 FPKM as per RSEM?

Cheers

RNA-Seq next-gen • 2.1k views

ADD COMMENT • link updated 8.1 years ago by Michael 54k • written 8.1 years ago by NotARobot • 0

0

Entering edit mode

I guess you need to state what is the purpose of doing so. Instead, you could just simulate a dataset such that all (of genes of interest) the genes has 1FPKM ?

ADD REPLY • link 8.1 years ago by GouthamAtla 12k

0

Entering edit mode

The reason for down sampling is to check how well RSEM estimates percent isoform usage at different expression levels. Would the percent isoform usage be the same at 1 FPKM as 20 FPKM expression of the same gene.

ADD REPLY • link 8.1 years ago by NotARobot • 0

score 1 · Answer 1 · 2016-03-19

Excuse me if I am misunderstanding your approach completely, but it makes no sense to me, why would you want that all measurements are the same, why then measure at all? If all genes have different measurements, there can be no random downsampling such that all measurements are identical afterwards, even if including rounding error. With FPKM these values should be even mostly unchanged by downsampling, because the division by the number of reads, which would be less for downsampling.

Say we have gene a of length 1kb, you sequenced 1 million reads, and a has 1000 reads, while gene b also is 1kb but has 500 reads, so gene a has 1000 FPKM and b 500 FPKM, say you downsample - truly randomly - 10% (100,000 reads) of all, what is the expected value of reads for a and b? well, say it is 100 and 50 respectively, but you also have only 100k reads, so the FPKMs for a and be will be 1000 and 500 with the expected value of downsampled reads.