Question: Number of reads in different alignments
gravatar for marongiu.luigi
2.2 years ago by
Germany, Mannheim, UMM
marongiu.luigi520 wrote:

Dear all,

Just to confirm: it is normal to have a different number of reads when aligning the same reads against two different genomes?

I think the answer is YES because the aligner generates different hits (reads) according to the reference genome. Also, the number of reads of the starting fastq file is different from that of the SAM file, thus SAM does not report a number that is tethered to the starting file.

For instance, I have:

x=$(zcat <file>_1.fq.gz | wc -l)
echo $L
1 190 389 447 # same number on the paired file
samtools flagstat <align_first_reference>.sam |  grep 'total' | cut -d+ -f1
2 383 795 990
samtools flagstat <align_second_reference>.sam |  grep 'total' | cut -d+ -f1
2 381 177 452

Thank you.

alignment reads sam file • 582 views
ADD COMMENTlink modified 2.2 years ago by _r_am31k • written 2.2 years ago by marongiu.luigi520

You also need to account for secondary alignments that are bound to be different as well.

ADD REPLYlink written 2.2 years ago by genomax92k
gravatar for Joe
2.2 years ago by
United Kingdom
Joe18k wrote:

If they're different genomes, of course. The sequence won't be exactly the same so it's to be expected that different numbers will map.

ADD COMMENTlink written 2.2 years ago by Joe18k

Also, filled gaps in newer versions of the reference may cause reads to become multimappers and potentially trigger the aligner to flag them as unmapped, given they align to too many regions. HISAT2 has such an issue if I remember correctly, producing notably different alignments when using hg19 vs hg38. What kind of references are you using?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by ATpoint42k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1036 users visited in the last hour