Question: R: extract interest nucleotide from BAM files (BS-Seq)
gravatar for Shicheng Guo
3.5 years ago by
Shicheng Guo7.4k
Shicheng Guo7.4k wrote:

Hi colleagues, 

I want to extract the nucleotide (C or T) for each reads from BAM files (BS-Seq) in some interesting positions (CpG sites). Do you know any R script can do it with well speed? 

As many colleagues mentioned that why I insist to use R. Sure, I can use perl and samtools to extract these informations, but It would be complicated to use many programming language. I know maybe be R would be slow to deal with this problem, but anyway, it is in the one platform and it would be easy for the user. Acutally, I have finished the Perl script to do it. I just want to transfer the code to R. 


# quickBamFlagSummary("Indx16_S12.sorted.clipped.bam", main.groups.only=TRUE)
flag1 <- scanBamFlag(isFirstMateRead=NA, isSecondMateRead=NA,isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
param1<-ScanBamParam(flag=flag1, what="seq")
gal1 <- readGAlignments("Indx16_S12.sorted.clipped.bam", use.names=TRUE, param=param1)
is_on_minus <- as.logical(strand(gal1) == "-")
seq[is_on_minus] <- reverseComplement(seq[is_on_minus])

bf <- BamFile("Indx16_S12.sorted.clipped.bam", asMates=F)
paln <- readGAlignmentsList(bf)
j <- junctions(paln, with.revmap=TRUE)
roi <- GRanges("chr1", IRanges(10056, width=500))
j_overlap <- paln[paln %over% roi]

Best regards, 



alignment • 1.1k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Shicheng Guo7.4k

Do you absolutely have to do this with R? It's great for some things, but terrible for things like this.

ADD REPLYlink written 3.5 years ago by Devon Ryan88k

R is intended for statistical data analysis. You can use samtools mpileup to get the base calls for a base position

ADD REPLYlink written 3.5 years ago by Irsan6.8k
gravatar for glihm
3.5 years ago by
glihm590 wrote:

Hi Shicheng Guo, 

like posted by Devon Ryan, you have lot's of opportunities in order to do that. The thing is: "What orders are?" :D

1) You have to do it with R -> This package from bioconductor will be helpful! 

2) You want to try an other solution, perhaps you have some knowledge with Python -> pysam is your man.

3) With other languages, you can use SamTools in order to extract sequences and then process them with bash scripts for example or languages like C -> samTools.

Depending on the R bioconductor package performance, other programming languages may be faster and easier to extend/develop than R scripts. 
However, if you are a pro R, enjoy it with Rsamtools package!

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by glihm590

More specifically, if R isn't needed then PileOMeth might be the simplest tool.

ADD REPLYlink written 3.5 years ago by Devon Ryan88k

I did not know this one, thank you!

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by glihm590
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 996 users visited in the last hour