Question

The number of reads from the BMA file is more than the number of reads from its original FASTQ file

0

Entering edit mode

4.1 years ago

Gary ▴ 480

Hi,

I have a FASTQ file (name as Origin_ELW24.fastq.gz) with 107,856,041 single-end 75bp reads. After trimming and alignment, we get a BAM file (name as ELW24.bam) via STAR. I use commands below to convert the BMA file to a new FASTQ file (name as New_ELW24.fastq.gz). The number of reads in the new FASTQ file is 122,444,250 that more than the number of reads in the original FASTQ file. I don't understand why the number of reads will increase after the trimming and alignment. Could you help me? Many thanks.

bedtools bamtofastq -i ELW24.bam -fq New_ELW24.fastq
gzip -c New_ELW24.fastq > New_ELW24.fastq.gz

Best,

Gary

RNA-Seq alignment BAM FASTQ • 1.1k views

ADD COMMENT • link updated 4.1 years ago by h.mon 35k • written 4.1 years ago by Gary ▴ 480

score 2 · Answer 1 · 2020-03-21

bedtools bamtofastq:

each alignment in the BAM file is converted to a FASTQ record in the -fq file.

It's probably because each read (that is not uniquely mapped) can be mapped to the reference more than once (multi-mapping reads).

https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf https://bedtools.readthedocs.io/en/latest/content/tools/bamtofastq.html

score 2 · Answer 2 · 2020-03-21

2

Entering edit mode

4.1 years ago

Arup Ghosh 3.2k

The extra number of reads you are observing because of the secondary alignment. Can you share the samtools flagstat results?

ADD COMMENT • link 4.1 years ago by Arup Ghosh 3.2k