Down sampling RNAseq bam file so that all genes have 1 FPKM
1
0
Entering edit mode
8.1 years ago
NotARobot • 0

Hi,

I have a bam file with alignment of reads to the transcripts generated by RSEMdeweylab.biostat.wisc.edu/rsem/) Different genes have different FPKM values.

Does anybody know a straight forward way to down sample the reads so that all genes have exactly 1 FPKM as per RSEM?

Cheers

RNA-Seq next-gen • 2.1k views
ADD COMMENT
0
Entering edit mode

I guess you need to state what is the purpose of doing so. Instead, you could just simulate a dataset such that all (of genes of interest) the genes has 1FPKM ?

ADD REPLY
0
Entering edit mode

The reason for down sampling is to check how well RSEM estimates percent isoform usage at different expression levels. Would the percent isoform usage be the same at 1 FPKM as 20 FPKM expression of the same gene.

ADD REPLY
1
Entering edit mode
8.1 years ago
Michael 54k

Excuse me if I am misunderstanding your approach completely, but it makes no sense to me, why would you want that all measurements are the same, why then measure at all? If all genes have different measurements, there can be no random downsampling such that all measurements are identical afterwards, even if including rounding error. With FPKM these values should be even mostly unchanged by downsampling, because the division by the number of reads, which would be less for downsampling.

Say we have gene a of length 1kb, you sequenced 1 million reads, and a has 1000 reads, while gene b also is 1kb but has 500 reads, so gene a has 1000 FPKM and b 500 FPKM, say you downsample - truly randomly - 10% (100,000 reads) of all, what is the expected value of reads for a and b? well, say it is 100 and 50 respectively, but you also have only 100k reads, so the FPKMs for a and be will be 1000 and 500 with the expected value of downsampled reads.

ADD COMMENT
0
Entering edit mode

The reason for down sampling is to check how well RSEM estimates percent isoform usage at different expression levels. The down sampling need not be truly random. It just needs to be random within a gene (random positions within the gene).

May be i should have used a better word than down sampling. It is gene specific sampling of reads to get genes with equal power to detect isoform usage.

ADD REPLY

Login before adding your answer.

Traffic: 3846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6