Question: R: extract interest nucleotide from BAM files (BS-Seq)
gravatar for Shicheng Guo
4.7 years ago by
Shicheng Guo8.2k
Shicheng Guo8.2k wrote:

Hi colleagues, 

I want to extract the nucleotide (C or T) for each reads from BAM files (BS-Seq) in some interesting positions (CpG sites). Do you know any R script can do it with well speed? 

As many colleagues mentioned that why I insist to use R. Sure, I can use perl and samtools to extract these informations, but It would be complicated to use many programming language. I know maybe be R would be slow to deal with this problem, but anyway, it is in the one platform and it would be easy for the user. Acutally, I have finished the Perl script to do it. I just want to transfer the code to R. 


# quickBamFlagSummary("Indx16_S12.sorted.clipped.bam", main.groups.only=TRUE)
flag1 <- scanBamFlag(isFirstMateRead=NA, isSecondMateRead=NA,isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
param1<-ScanBamParam(flag=flag1, what="seq")
gal1 <- readGAlignments("Indx16_S12.sorted.clipped.bam", use.names=TRUE, param=param1)
is_on_minus <- as.logical(strand(gal1) == "-")
seq[is_on_minus] <- reverseComplement(seq[is_on_minus])

bf <- BamFile("Indx16_S12.sorted.clipped.bam", asMates=F)
paln <- readGAlignmentsList(bf)
j <- junctions(paln, with.revmap=TRUE)
roi <- GRanges("chr1", IRanges(10056, width=500))
j_overlap <- paln[paln %over% roi]

Best regards, 



alignment • 1.4k views
ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Shicheng Guo8.2k

Do you absolutely have to do this with R? It's great for some things, but terrible for things like this.

ADD REPLYlink written 4.7 years ago by Devon Ryan95k

R is intended for statistical data analysis. You can use samtools mpileup to get the base calls for a base position

ADD REPLYlink written 4.7 years ago by Irsan7.1k
gravatar for glihm
4.7 years ago by
glihm620 wrote:

Hi Shicheng Guo, 

like posted by Devon Ryan, you have lot's of opportunities in order to do that. The thing is: "What orders are?" :D

1) You have to do it with R -> This package from bioconductor will be helpful! 

2) You want to try an other solution, perhaps you have some knowledge with Python -> pysam is your man.

3) With other languages, you can use SamTools in order to extract sequences and then process them with bash scripts for example or languages like C -> samTools.

Depending on the R bioconductor package performance, other programming languages may be faster and easier to extend/develop than R scripts. 
However, if you are a pro R, enjoy it with Rsamtools package!

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by glihm620

More specifically, if R isn't needed then PileOMeth might be the simplest tool.

ADD REPLYlink written 4.7 years ago by Devon Ryan95k

I did not know this one, thank you!

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by glihm620
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 922 users visited in the last hour