My fastq files(paired end) have different read number
1
0
Entering edit mode
16 months ago
kimgeng • 0

Hello, I'm doing mapping & Calling of my data, but, I have some question about my data.

Here is the result of "samtools flagstats" of my bam file (created by bwa aln)

but, I fount that two files have different read counts. although it is paired-end data.

I believed that paired-end sequencing data must have same amount of reads, but, my data does not. And, as you can see that there is a lot of singletons and very small proportion of "properly paired"

Does my data have problem? Should I remove my data and try again..?

enter image description here

fastq mapping • 1.2k views
ADD COMMENT
0
Entering edit mode

Only 2.9% are properly paired. This is bad. You could tell use a bit more about your sequencing project and application and especially ref used so we can help.

ADD REPLY
0
Entering edit mode
16 months ago
ntsopoul ▴ 60

Yes, this is normal because depending on how you generate the bam file, the non-aligning reads are saved in the .bam file along the properly aligned.

ADD COMMENT
0
Entering edit mode

Oh, I see, Thank you however, I want to make vcf files with gatk haplotypecaller and GenotypeGVCFs. But I got some problems I have about 2.6 10^8 mapped reads numbers, however, as you see, I have only 7.9 10^6 properly paired read counts.

maybe, ProperlyPairedReadFilter of HaplotypeCaller can filter and call only properly paired reads, but, in that case, I have very small amount of variants. So I wonder which one is more proper. (call only properly paired / call all of the paired mapped reads)

ADD REPLY

Login before adding your answer.

Traffic: 3004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6