Question: Remove duplicate reads in single cell bam file
gravatar for emre.cto
21 days ago by
emre.cto0 wrote:

Is there a tool to remove duplicated reads from single cell bam file? I am currently dealing with single cell atac-seq file and apparently people have used their own custom scripts for that. E.g this article:

single cell atac seq bam • 126 views
ADD COMMENTlink written 21 days ago by emre.cto0

Did your BAM file come from 10x data? Do you have multiple cells in a single BAM file? Are you referring to deduplicating the reads based on their UMI or plain sequence?

ADD REPLYlink modified 21 days ago • written 21 days ago by genomax69k

Yes it is 10x data and I have multiple cells in a single BAM file. I want to remove the duplicates based on plain sequence, I do not have UMI sequence.

ADD REPLYlink written 21 days ago by emre.cto0

Barcodes and UMI's should be in the BAM file. They are encoded by the BC and CR tags (for RNAseq data).

Since this is 10x ATACseq data can you post a couple of lines of your BAM?

You can use @Pierre's solution here to split the file by cell barcodes (BC tags) : A: Splitting a bam file by unique optional TAG field

You may need to covert the data back to fastq if you want to use a solution like clumpify to remove duplicates based on sequence. If by dedupe you mean based on existing alignment position then please confirm that.

ADD REPLYlink modified 20 days ago • written 20 days ago by genomax69k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 862 users visited in the last hour