Question: Duplicated read names in Fastq file
gravatar for abascalfederico
4.3 years ago by
abascalfederico1.1k wrote:



I have a BAM file containing paired-end reads. This BAM file is just for storing the results of the sequencing, i.e. it does not contain mapping information.

When I convert this BAM file to a FASTQ file I can see that in some cases there are reads with duplicated names (for about 1% of the reads). For example:


If I use bamToFastq (from bedtools) to generate two FASTQ files (one with each mate from a pair), all these "duplications" are removed. Apparently, bamToFastq retains one of each duplicated read name in a random fashion. Then, it raises a warning about next read having no mate.

Is it normal to have this kind of read name duplications? What is be the best way to handle these duplications?

Many thanks,



sequencing next-gen • 2.2k views
ADD COMMENTlink modified 11 months ago by Diedes0 • written 4.3 years ago by abascalfederico1.1k

How did you make the BAM file? I suspect that the origin of this issue is there.

ADD REPLYlink written 4.3 years ago by Devon Ryan92k

Did you check to see if the reads with duplicated names are identical?

ADD REPLYlink written 4.3 years ago by h.mon27k

H.mon: No, they are different. In fact, in these cases one of the redundant reads is usually much shorter than the typical read length.

Devon: I have asked how these BAM files were generated from the sequencing. Waiting for an answer.


ADD REPLYlink written 4.3 years ago by abascalfederico1.1k

Could this perhaps be a case where a read was mapped twice (i.e. BWAmem), I'm not sure if converting a bam with multi-mapped reads to FASTQ format would cause this but it might.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by zlskidmore290
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2231 users visited in the last hour