Question: Remove duplicate reads in single cell bam file
21 days ago
emre.cto0 wrote:

Is there a tool to remove duplicated reads from single cell bam file? I am currently dealing with single cell atac-seq file and apparently people have used their own custom scripts for that. E.g this article:

Did your BAM file come from 10x data? Do you have multiple cells in a single BAM file? Are you referring to deduplicating the reads based on their UMI or plain sequence?

Yes it is 10x data and I have multiple cells in a single BAM file. I want to remove the duplicates based on plain sequence, I do not have UMI sequence.

Barcodes and UMI's should be in the BAM file. They are encoded by the BC and CR tags (for RNAseq data).

Since this is 10x ATACseq data can you post a couple of lines of your BAM?

You can use @Pierre's solution here to split the file by cell barcodes (BC tags) : A: Splitting a bam file by unique optional TAG field

You may need to covert the data back to fastq if you want to use a solution like clumpify to remove duplicates based on sequence. If by dedupe you mean based on existing alignment position then please confirm that.

