Is it possible to get reference allele info at each position within a range from a bam file in Rsamtools?
0
0
Entering edit mode
4.7 years ago
mrz132435 ▴ 20

How do I get info about which base matches to the reference genome from a bam file? Specifically, is it possible to do this in Rsamtools? Using the code below (where 'bam' and 'bai' have already been defined as the bam and bai file paths) I can use pileup get a data frame that has the counts for each base at each position in the specified range:

chr <- "5"
start <- 112164646
end <- 112164666
p_param <- PileupParam(distinguish_nucleotides=TRUE,
                   distinguish_strands=FALSE,
                   min_base_quality=10)
sbp <- ScanBamParam(which=GRanges(as.character(chr), IRanges(IRanges(start, end))))
res <- pileup(bam, index = bai, scanBamParam=sbp, pileupParam=p_param)

And what I get looks like this:

seqnames       pos nucleotide count           which_label
   5 112164646          G   244 5:112164646-112164666
   5 112164646          T    54 5:112164646-112164666
   5 112164647          A   298 5:112164646-112164666
   5 112164648          A   297 5:112164646-112164666

Is there any way to get it to tell me whether a base matches the reference base at each locus? The Rdocumentation link (https://www.rdocumentation.org/packages/Rsamtools/versions/1.24.0/topics/pileup) seems to suggest the possibility ("The ‘=’ nucleotide code in the SEQ field (to mean ‘identical to reference genome’) is supported by pileup, such that a match with the reference genome is counted separately in the results if distinguish_nucleotides is TRUE."). Thanks for any assistance.

R sequencing • 720 views
ADD COMMENT

Login before adding your answer.

Traffic: 1349 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6