Question: R: extract interest nucleotide from BAM files (BS-Seq)
2
gravatar for Shicheng Guo
4.1 years ago by
Shicheng Guo7.8k
Shicheng Guo7.8k wrote:

Hi colleagues, 

I want to extract the nucleotide (C or T) for each reads from BAM files (BS-Seq) in some interesting positions (CpG sites). Do you know any R script can do it with well speed? 

As many colleagues mentioned that why I insist to use R. Sure, I can use perl and samtools to extract these informations, but It would be complicated to use many programming language. I know maybe be R would be slow to deal with this problem, but anyway, it is in the one platform and it would be easy for the user. Acutally, I have finished the Perl script to do it. I just want to transfer the code to R. 

library("Rsamtools")
library("GenomicRanges")
library("GenomicAlignments")

setwd("/home/shg047/bam")
# quickBamFlagSummary("Indx16_S12.sorted.clipped.bam", main.groups.only=TRUE)
flag1 <- scanBamFlag(isFirstMateRead=NA, isSecondMateRead=NA,isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
param1<-ScanBamParam(flag=flag1, what="seq")
gal1 <- readGAlignments("Indx16_S12.sorted.clipped.bam", use.names=TRUE, param=param1)
seq<-mcols(gal1)$seq
is_on_minus <- as.logical(strand(gal1) == "-")
seq[is_on_minus] <- reverseComplement(seq[is_on_minus])

bf <- BamFile("Indx16_S12.sorted.clipped.bam", asMates=F)
paln <- readGAlignmentsList(bf)
j <- junctions(paln, with.revmap=TRUE)
roi <- GRanges("chr1", IRanges(10056, width=500))
findOverlaps(paln,roi)
j_overlap <- paln[paln %over% roi]
paln[j_overlap$revmap[[1]]]

Best regards, 

 

 

alignment • 1.3k views
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Shicheng Guo7.8k
1

Do you absolutely have to do this with R? It's great for some things, but terrible for things like this.

ADD REPLYlink written 4.1 years ago by Devon Ryan92k

R is intended for statistical data analysis. You can use samtools mpileup to get the base calls for a base position

ADD REPLYlink written 4.1 years ago by Irsan7.0k
0
gravatar for glihm
4.1 years ago by
glihm610
France
glihm610 wrote:

Hi Shicheng Guo, 

like posted by Devon Ryan, you have lot's of opportunities in order to do that. The thing is: "What orders are?" :D

1) You have to do it with R -> This package from bioconductor will be helpful! 

2) You want to try an other solution, perhaps you have some knowledge with Python -> pysam is your man.

3) With other languages, you can use SamTools in order to extract sequences and then process them with bash scripts for example or languages like C -> samTools.


Depending on the R bioconductor package performance, other programming languages may be faster and easier to extend/develop than R scripts. 
However, if you are a pro R, enjoy it with Rsamtools package!

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by glihm610
1

More specifically, if R isn't needed then PileOMeth might be the simplest tool.

ADD REPLYlink written 4.1 years ago by Devon Ryan92k

I did not know this one, thank you!

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by glihm610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1629 users visited in the last hour