Question

Is it possible to get reference allele info at each position within a range from a bam file in Rsamtools?

0

Entering edit mode

4.7 years ago

mrz132435 ▴ 20

How do I get info about which base matches to the reference genome from a bam file? Specifically, is it possible to do this in Rsamtools? Using the code below (where 'bam' and 'bai' have already been defined as the bam and bai file paths) I can use pileup get a data frame that has the counts for each base at each position in the specified range:

chr <- "5"
start <- 112164646
end <- 112164666
p_param <- PileupParam(distinguish_nucleotides=TRUE,
                   distinguish_strands=FALSE,
                   min_base_quality=10)
sbp <- ScanBamParam(which=GRanges(as.character(chr), IRanges(IRanges(start, end))))
res <- pileup(bam, index = bai, scanBamParam=sbp, pileupParam=p_param)

And what I get looks like this:

seqnames       pos nucleotide count           which_label
   5 112164646          G   244 5:112164646-112164666
   5 112164646          T    54 5:112164646-112164666
   5 112164647          A   298 5:112164646-112164666
   5 112164648          A   297 5:112164646-112164666

Is there any way to get it to tell me whether a base matches the reference base at each locus? The Rdocumentation link (https://www.rdocumentation.org/packages/Rsamtools/versions/1.24.0/topics/pileup) seems to suggest the possibility ("The ‘=’ nucleotide code in the SEQ field (to mean ‘identical to reference genome’) is supported by pileup, such that a match with the reference genome is counted separately in the results if distinguish_nucleotides is TRUE."). Thanks for any assistance.

R sequencing • 720 views

ADD COMMENT • link 4.7 years ago by mrz132435 ▴ 20