The number of reads from the BMA file is more than the number of reads from its original FASTQ file
2
0
Entering edit mode
19 months ago
Gary ▴ 480

Hi,

I have a FASTQ file (name as Origin_ELW24.fastq.gz) with 107,856,041 single-end 75bp reads. After trimming and alignment, we get a BAM file (name as ELW24.bam) via STAR. I use commands below to convert the BMA file to a new FASTQ file (name as New_ELW24.fastq.gz). The number of reads in the new FASTQ file is 122,444,250 that more than the number of reads in the original FASTQ file. I don't understand why the number of reads will increase after the trimming and alignment. Could you help me? Many thanks.

bedtools bamtofastq -i ELW24.bam -fq New_ELW24.fastq
gzip -c New_ELW24.fastq > New_ELW24.fastq.gz

Best,

Gary

RNA-Seq alignment BAM FASTQ • 484 views
ADD COMMENT
2
Entering edit mode
19 months ago
Fatima ▴ 960

bedtools bamtofastq:

each alignment in the BAM file is converted to a FASTQ record in the -fq file.

It's probably because each read (that is not uniquely mapped) can be mapped to the reference more than once (multi-mapping reads).

https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf https://bedtools.readthedocs.io/en/latest/content/tools/bamtofastq.html

ADD COMMENT
2
Entering edit mode
19 months ago

The extra number of reads you are observing because of the secondary alignment. Can you share the samtools flagstat results?

ADD COMMENT

Login before adding your answer.

Traffic: 1750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6