all.reads.bam file records mapped RNA-seq reads data, including:
- exon:exon junction
- exon body
- intron body
- exon:intron junction
Q1: When calculating RPKM for given RefSeq gene including all the position reads, will the following command just calculate exon:exon junction reads and at same time ignore all other reads?
coverageBED -abam all.reads.bam -b refseq.genes.BED12.bed -s -split >coverage.bed
I'm confused by the mannual (Page 62):
When dealing with RNA-seq reads, for example, one typically wants to only tabulate coverage for the portions of the reads that come from exons (and ignore the interstitial intron seqeunce), The -split command allows for such coverage to be performed.
If "-split" is set, the exon:exon read (for example, 30M3000N46M") exists in -abam bam file, and the 3000N will NOT be wrongly intersected when running intersectBED command. But what about coverageBED command? I do hope the 3000N will be not calculated which makes sense, and I also hope the intron body reads and other reads will be NOT ignored.
Q2: If one just want to calculate exon's RPKM, does it mean one should prepare -b file to record all the exon information, and run like this:
coverageBED -abam all.reads.bam -b all.exons.bed -s >coverage.bed
Q3: How to calculate RPKM for given genes whose reads overlap with exons? We all known BED12 format file record the exon information (start and length) for given RefSeq gene. Could BEDtools do the magic things? Note this calculation is different from Q1.
Thank you, and looking forward to your replies.