Making duplicate reads in Bam file

0

Entering edit mode

23 months ago

rugarem • 0

I am testing the performance of different variant callers in dealing with duplicate reads. I want to look at the effect of increasing duplicate reads in a bam file on performance. I know there's lots of tools for removing duplicate or marking duplicate reads, but does anyone know of any tools/scripts for making duplicate reads?

For example, make 5, 10, 15x duplicate copies of each read and its mate in this bam file.

I've tried simply copying the the text files, but this causes problems with some tools downstream, probably because the reads are too identical.

Thank you!

duplicate bam fastq reads NGS • 607 views

ADD COMMENT • link 23 months ago by rugarem • 0

0

Entering edit mode

Do you know any languages? This would be a fun motivation to learn a few lines of Python or Perl.

ADD REPLY • link 23 months ago by seidel 11k

0

Entering edit mode

I'm an early intermediate at python. That's my next step if this doesn't already exist!

ADD REPLY • link 23 months ago by rugarem • 0

0

Entering edit mode

Not answering your question but you can try clumpify.sh to count duplicates in an alignment free manner: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.

ADD REPLY • link 23 months ago by GenoMax 141k

Login before adding your answer.