Duplicated read names in Fastq file
0
2
Entering edit mode
8.8 years ago
abascalfederico ★ 1.2k

Hi,

I have a BAM file containing paired-end reads. This BAM file is just for storing the results of the sequencing, i.e. it does not contain mapping information.

When I convert this BAM file to a FASTQ file I can see that in some cases there are reads with duplicated names (for about 1% of the reads). For example:

@HS34_15849:1:1101:1065:15188#26/1
@HS34_15849:1:1101:1065:15188#26/2
@HS34_15849:1:1101:1065:15188#26/2

If I use bamToFastq (from bedtools) to generate two FASTQ files (one with each mate from a pair), all these "duplications" are removed. Apparently, bamToFastq retains one of each duplicated read name in a random fashion. Then, it raises a warning about next read having no mate.

Is it normal to have this kind of read name duplications? What is be the best way to handle these duplications?

Many thanks,
Federico

next-gen-sequencing • 3.7k views
ADD COMMENT
0
Entering edit mode

How did you make the BAM file? I suspect that the origin of this issue is there.

ADD REPLY
0
Entering edit mode

Did you check to see if the reads with duplicated names are identical?

ADD REPLY
0
Entering edit mode

H.mon: No, they are different. In fact, in these cases one of the redundant reads is usually much shorter than the typical read length.

Devon: I have asked how these BAM files were generated from the sequencing. Waiting for an answer.

ADD REPLY
0
Entering edit mode

Could this perhaps be a case where a read was mapped twice (i.e. BWAmem), I'm not sure if converting a bam with multi-mapped reads to FASTQ format would cause this but it might.

ADD REPLY

Login before adding your answer.

Traffic: 1963 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6