Entering edit mode
4.8 years ago
emre.cto
•
0
Is there a tool to remove duplicated reads from single cell bam file? I am currently dealing with single cell atac-seq file and apparently people have used their own custom scripts for that. E.g this article:
https://www.sciencedirect.com/science/article/pii/S0092867418308559?via
Did your BAM file come from 10x data? Do you have multiple cells in a single BAM file? Are you referring to deduplicating the reads based on their UMI or plain sequence?
Yes it is 10x data and I have multiple cells in a single BAM file. I want to remove the duplicates based on plain sequence, I do not have UMI sequence.
Barcodes and UMI's should be in the BAM file. They are encoded by the
BC
andCR
tags (for RNAseq data).Since this is 10x ATACseq data can you post a couple of lines of your BAM?
You can use @Pierre's solution here to split the file by cell barcodes (
BC
tags) : A: Splitting a bam file by unique optional TAG fieldYou may need to covert the data back to fastq if you want to use a solution like
clumpify
to remove duplicates based on sequence. If bydedupe
you mean based on existing alignment position then please confirm that.