Entering edit mode
12.2 years ago
Pavel Senin
★
1.9k
Hi folks:
I need programmatically get RNASeq coverage information out of BAM file into R and, unfortunately, my old genomic sequencing code (below) doesn't fit anymore. Is it possible to efficiently modify it, so it will handle junctions?
#
# 'scan' the bam file reading: 1) position alignment starts 2) read width (query), all this for a region of interest,
# and discard unmapped reads
param <- ScanBamParam(what=c('pos', 'qwidth'),
which=GRanges(seqName,IRanges(start=startPos, end=endPos)),
flag=scanBamFlag(isUnmappedQuery=FALSE))
bam <- scanBam(bamFName,index=bamIdx,param=param)[[1]]
#
# get the coverage as a run-length encoded vector
#
coverage <- coverage(IRanges(bam[["pos"]], width=bam[["qwidth"]]))
Edit: the code is very simple and appears not to be "junction-aware" - it doesnt look into CIGAR for insertions which are introns - it simply computes a coverage value by summing up _fragments_ of size _read length_ which start at _position_ which is in BAM file.
EDIT:
Solved. Thanks, Jeremy! :
param <- ScanBamParam(which=GRanges(seqName,IRanges(start=startPos, end=endPos)),
flag=scanBamFlag(isNotPrimaryRead=FALSE))
rg = readGappedAlignmentPairs(bamFName,index=bamIdx,param=param)
cvg <- coverage(rg)[1]
Not totally sure what you mean by 'junction aware', are interested in exon-intron junctions? I don't know what coordinates you feed in
which=GRanges(seqName,IRanges(start=startPos, end=endPos))
. If these are gene coordinates, the first step would be to get exon coordinates in, still the per base coverage wouldn't change. So, I think the idea is just to look up the coverage at those coordinates where you need to know it.Sorry, it wasn't clear, will try to fix right now