I want to extract the nucleotide (C or T) for each reads from BAM files (BS-Seq) in some interesting positions (CpG sites). Do you know any R script can do it with well speed?
As many colleagues mentioned that why I insist to use R. Sure, I can use perl and samtools to extract these informations, but It would be complicated to use many programming language. I know maybe be R would be slow to deal with this problem, but anyway, it is in the one platform and it would be easy for the user. Acutally, I have finished the Perl script to do it. I just want to transfer the code to R.
library("Rsamtools") library("GenomicRanges") library("GenomicAlignments") setwd("/home/shg047/bam") # quickBamFlagSummary("Indx16_S12.sorted.clipped.bam", main.groups.only=TRUE) flag1 <- scanBamFlag(isFirstMateRead=NA, isSecondMateRead=NA,isDuplicate=FALSE, isNotPassingQualityControls=FALSE) param1<-ScanBamParam(flag=flag1, what="seq") gal1 <- readGAlignments("Indx16_S12.sorted.clipped.bam", use.names=TRUE, param=param1) seq<-mcols(gal1)$seq is_on_minus <- as.logical(strand(gal1) == "-") seq[is_on_minus] <- reverseComplement(seq[is_on_minus]) bf <- BamFile("Indx16_S12.sorted.clipped.bam", asMates=F) paln <- readGAlignmentsList(bf) j <- junctions(paln, with.revmap=TRUE) roi <- GRanges("chr1", IRanges(10056, width=500)) findOverlaps(paln,roi) j_overlap <- paln[paln %over% roi] paln[j_overlap$revmap[]]