I have a BAM file containing paired-end reads. This BAM file is just for storing the results of the sequencing, i.e. it does not contain mapping information.
When I convert this BAM file to a FASTQ file I can see that in some cases there are reads with duplicated names (for about 1% of the reads). For example:
@HS34_15849:1:1101:1065:15188#26/1 @HS34_15849:1:1101:1065:15188#26/2 @HS34_15849:1:1101:1065:15188#26/2
If I use bamToFastq (from bedtools) to generate two FASTQ files (one with each mate from a pair), all these "duplications" are removed. Apparently, bamToFastq retains one of each duplicated read name in a random fashion. Then, it raises a warning about next read having no mate.
Is it normal to have this kind of read name duplications? What is be the best way to handle these duplications?