Remove duplicate reads in single cell bam file

0

Entering edit mode

4.8 years ago

emre.cto • 0

Is there a tool to remove duplicated reads from single cell bam file? I am currently dealing with single cell atac-seq file and apparently people have used their own custom scripts for that. E.g this article:

https://www.sciencedirect.com/science/article/pii/S0092867418308559?via

single cell atac seq bam • 1.9k views

ADD COMMENT • link 4.8 years ago by emre.cto • 0

1

Entering edit mode

Did your BAM file come from 10x data? Do you have multiple cells in a single BAM file? Are you referring to deduplicating the reads based on their UMI or plain sequence?

ADD REPLY • link 4.8 years ago by GenoMax 141k

0

Entering edit mode

Yes it is 10x data and I have multiple cells in a single BAM file. I want to remove the duplicates based on plain sequence, I do not have UMI sequence.

ADD REPLY • link 4.8 years ago by emre.cto • 0

0

Entering edit mode

Barcodes and UMI's should be in the BAM file. They are encoded by the BC and CR tags (for RNAseq data).

Since this is 10x ATACseq data can you post a couple of lines of your BAM?

You can use @Pierre's solution here to split the file by cell barcodes (BC tags) : A: Splitting a bam file by unique optional TAG field

You may need to covert the data back to fastq if you want to use a solution like clumpify to remove duplicates based on sequence. If by dedupe you mean based on existing alignment position then please confirm that.

ADD REPLY • link 4.8 years ago by GenoMax 141k

Login before adding your answer.