Entering edit mode
23 months ago
rugarem
•
0
I am testing the performance of different variant callers in dealing with duplicate reads. I want to look at the effect of increasing duplicate reads in a bam file on performance. I know there's lots of tools for removing duplicate or marking duplicate reads, but does anyone know of any tools/scripts for making duplicate reads?
For example, make 5, 10, 15x duplicate copies of each read and its mate in this bam file.
I've tried simply copying the the text files, but this causes problems with some tools downstream, probably because the reads are too identical.
Thank you!
Do you know any languages? This would be a fun motivation to learn a few lines of Python or Perl.
I'm an early intermediate at python. That's my next step if this doesn't already exist!
Not answering your question but you can try
clumpify.sh
to count duplicates in an alignment free manner: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.