I'd like to use Rsamtools to remove duplicate reads from a bam file. I'm looking for a solution similar to samtools markdup -r -s in.bam out.bam. Can anyone tell me how to do this in R, preferably with Rsamtools? But other solutions are also fine :)
I'd like to use Rsamtools to remove duplicate reads from a bam file. I'm looking for a solution similar to samtools markdup -r -s in.bam out.bam. Can anyone tell me how to do this in R, preferably with Rsamtools? But other solutions are also fine :)
You can run any system command from inside R using system()
. This works well if there is a single command without many dependencies. For example;
samtools.rmdup.cmd <- paste0("/path/to/samtools markdup -r -s ", in.bam, " ", out.bam)
lapply(samtools.rmdup.cmd, function(x) system(x))
If the command has dependencies, you can create a conda and execute the command path that exists in the conda, or alternatively write a bash file from within R where all the dependencies (ex modules) are loaded before running the commands, and then call the bash file from within R, using for example system("/path/to/samtools_cmds.sh")
.
What I described above doesn't require any R packages, only that the programs are installed on the computer.
My honest opinion: Do processing of sequencing data on the standard command line with samtools. There is no need for the added complexity to do that with a wrapper from inside R, really. R is for analysis and plotting, not for these sorts of operations. Usage for markdup is in the samtools manual: http://www.htslib.org/doc/samtools-markdup.html
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
bash, nextflow, snakemake