Remove duplicate reads in single cell bam file
0
0
Entering edit mode
4.8 years ago
emre.cto • 0

Is there a tool to remove duplicated reads from single cell bam file? I am currently dealing with single cell atac-seq file and apparently people have used their own custom scripts for that. E.g this article:

https://www.sciencedirect.com/science/article/pii/S0092867418308559?via

single cell atac seq bam • 1.9k views
ADD COMMENT
1
Entering edit mode

Did your BAM file come from 10x data? Do you have multiple cells in a single BAM file? Are you referring to deduplicating the reads based on their UMI or plain sequence?

ADD REPLY
0
Entering edit mode

Yes it is 10x data and I have multiple cells in a single BAM file. I want to remove the duplicates based on plain sequence, I do not have UMI sequence.

ADD REPLY
0
Entering edit mode

Barcodes and UMI's should be in the BAM file. They are encoded by the BC and CR tags (for RNAseq data).

Since this is 10x ATACseq data can you post a couple of lines of your BAM?

You can use @Pierre's solution here to split the file by cell barcodes (BC tags) : A: Splitting a bam file by unique optional TAG field

You may need to covert the data back to fastq if you want to use a solution like clumpify to remove duplicates based on sequence. If by dedupe you mean based on existing alignment position then please confirm that.

ADD REPLY

Login before adding your answer.

Traffic: 2093 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6