Entering edit mode
6.4 years ago
emre.cto
•
0
Is there a tool to remove duplicated reads from single cell bam file? I am currently dealing with single cell atac-seq file and apparently people have used their own custom scripts for that. E.g this article:
https://www.sciencedirect.com/science/article/pii/S0092867418308559?via
Did your BAM file come from 10x data? Do you have multiple cells in a single BAM file? Are you referring to deduplicating the reads based on their UMI or plain sequence?
Yes it is 10x data and I have multiple cells in a single BAM file. I want to remove the duplicates based on plain sequence, I do not have UMI sequence.
Barcodes and UMI's should be in the BAM file. They are encoded by the
BCandCRtags (for RNAseq data).Since this is 10x ATACseq data can you post a couple of lines of your BAM?
You can use @Pierre's solution here to split the file by cell barcodes (
BCtags) : A: Splitting a bam file by unique optional TAG fieldYou may need to covert the data back to fastq if you want to use a solution like
clumpifyto remove duplicates based on sequence. If bydedupeyou mean based on existing alignment position then please confirm that.