Entering edit mode
8 hours ago
curious
▴
900
I have 30x WGS human samples and want to estimate contamination using the GATK tools to crosscheck another software that I am using to estimate contamination, this is what I have done:
gatk GetPileupSummaries \
-I my_sample.cram \
-V small_exac_common_3.hg38.vcf.gz \
-L small_exac_common_3.hg38.vcf.gz \
-O my_sample.getpileups.table \
--reference hs38DH.fa
gatk CalculateContamination \
-I my_sample.getpileups.table \
-O my_sample.contamination.table
where small_exac_common_3.hg38.vcf.gz was gotten from gs://gatk-best-practices/somatic-hg38/small_exac_common_3.hg38.vcf.gz
My main questions are:
1. Is this an appropriate contam estimation workflow for germline human WGS?
2. If so, is small_exac_common_3.hg38.vcf.gz a reasonable file to pass for the -V and -L arguments?
3. from my_sample.contamination.table I get columns "sample", "contamination", "error". If the value of "contamination" is 0.01, does this mean the sample is contaminated 1%?