Trouble with best practices
0
0
Entering edit mode
6.2 years ago
erarroji • 0

Hi

I am new with all NGS tools, I tried to follow best practices workflow for my exome reads from tumor samples: I align with bwa mem, sort with samtools, mark duplicates and recalibrated bases with GATK 4 and finally call variants with Mutect 2. Although the software run successfully I noticed that the mark duplicates metrics only have a 0.3% of duplicates, which I double check with samtools flagstat. So I marked duplicates with samtools, from the same bam, which give a 30% of duplicates.

Later when I did the variant calling the output from the bam mark with samtools was of 297061 mutations againts 50370 from the bam mark with GATK.

I am not sure which file is the right one. What could I be doing wrong? How can I make sure which file is worked correctly?.

Thank you

Ernesto Rojas

samtools WES NGS GATK • 1.5k views
ADD COMMENT
2
Entering edit mode

One should never expect different variant calling pipelines to come up with the same number of variants, of course. Each program has its own idea about which QC thresholds are important. You have neither explained in detail the steps that you have taken (with code), so, any comments here are going to be just speculative.

As per Istvan, if you're new to this, then better to trust the GATK calls for now.

ADD REPLY
0
Entering edit mode

Thank you, I will trust GATK. The variant calling was made with the same tool (Mutect 2), I only changed the markduplicates tools. In all cases y used default values, could that be what is generating troubles?

ADD REPLY
0
Entering edit mode

Ah, so, you called variants with Mutect2 in both situations; however, in one pipeline you removed duplicates with samtools rmdup?

Edit: I would go by the BAM with duplicates marked by Picard MarkDuplicates.

ADD REPLY
1
Entering edit mode

first, make sure that the numbers for duplicates are really off, 0.3 could be 30% if expressed as a fraction.

As for the mutations, I would trust the ones produced with GATK. It probably has more corrections built into it that remove more false positives.

Welcome to bioinformatics :-)

ADD REPLY
0
Entering edit mode

Thank you, I did checked the duplicates, they were as fraction and also I checked it with samtools flagstat making the calculation with the raw values.

ADD REPLY

Login before adding your answer.

Traffic: 2321 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6