Making duplicate reads in Bam file
0
0
Entering edit mode
23 months ago
rugarem • 0

I am testing the performance of different variant callers in dealing with duplicate reads. I want to look at the effect of increasing duplicate reads in a bam file on performance. I know there's lots of tools for removing duplicate or marking duplicate reads, but does anyone know of any tools/scripts for making duplicate reads?

For example, make 5, 10, 15x duplicate copies of each read and its mate in this bam file.

I've tried simply copying the the text files, but this causes problems with some tools downstream, probably because the reads are too identical.

Thank you!

duplicate bam fastq reads NGS • 606 views
ADD COMMENT
0
Entering edit mode

Do you know any languages? This would be a fun motivation to learn a few lines of Python or Perl.

ADD REPLY
0
Entering edit mode

I'm an early intermediate at python. That's my next step if this doesn't already exist!

ADD REPLY
0
Entering edit mode

Not answering your question but you can try clumpify.sh to count duplicates in an alignment free manner: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.

ADD REPLY

Login before adding your answer.

Traffic: 1786 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6