Question

My fastq files(paired end) have different read number

0

Entering edit mode

13 months ago

kimgeng • 0

Hello, I'm doing mapping & Calling of my data, but, I have some question about my data.

Here is the result of "samtools flagstats" of my bam file (created by bwa aln)

but, I fount that two files have different read counts. although it is paired-end data.

I believed that paired-end sequencing data must have same amount of reads, but, my data does not. And, as you can see that there is a lot of singletons and very small proportion of "properly paired"

Does my data have problem? Should I remove my data and try again..?

enter image description here

fastq mapping • 1.0k views

ADD COMMENT • link updated 13 months ago by colindaven 6.4k • written 13 months ago by kimgeng • 0

0

Entering edit mode

Only 2.9% are properly paired. This is bad. You could tell use a bit more about your sequencing project and application and especially ref used so we can help.

ADD REPLY • link 13 months ago by colindaven 6.4k

score 0 · Answer 1 · 2023-03-21

0

Entering edit mode

13 months ago

ntsopoul ▴ 60

Yes, this is normal because depending on how you generate the bam file, the non-aligning reads are saved in the .bam file along the properly aligned.

ADD COMMENT • link 13 months ago by ntsopoul ▴ 60

0

Entering edit mode

Oh, I see, Thank you however, I want to make vcf files with gatk haplotypecaller and GenotypeGVCFs. But I got some problems I have about 2.6 10^8 mapped reads numbers, however, as you see, I have only 7.9 10^6 properly paired read counts.

maybe, ProperlyPairedReadFilter of HaplotypeCaller can filter and call only properly paired reads, but, in that case, I have very small amount of variants. So I wonder which one is more proper. (call only properly paired / call all of the paired mapped reads)

ADD REPLY • link 13 months ago by kimgeng • 0