gatk GetPileupSummaries and CalculateContamination result in NaN on mouse data
1
0
Entering edit mode
2.5 years ago
wormball ▴ 10

Hello!

I ran gatk toolchain including CalculateContamination in galaxy on human exome sequencing data, and it worked fine. However when I try feeding it with murine data (and murine reference files), CalculateContamination gives me this contamination table:

sample  contamination   error
mouse1_tumor    NaN 1.0

And being fed with this result, FilterMutectCalls says:

java.lang.IllegalArgumentException: log10p: Log10-probability must be 0 or less
    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
    at org.broadinstitute.hellbender.utils.MathUtils.log10BinomialProbability(MathUtils.java:934)

I suspect there is a connection between these results.

I tried two vcf files - only for one mouse strain C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz and one for all mice mgp.v5.merged.snps_all.dbSNP142.vcf.gz . I processed them to be biallelic like this:

bcftools plugin fill-tags C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf -Oz -o with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz -- -t AF
bcftools view -Oz -m2 -M2 with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf > biallelic.with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz
/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk IndexFeatureFile -F biallelic.with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz

With one strain file GetPileupSummaries gives no data at all:

#<METADATA>SAMPLE=mouse1_normal                 
contig  position    ref_count   alt_count   other_alt_count allele_frequency

And with panmurine vcf file it gives about 19,538,774 lines:

#SAMPLE=mouse1_tumor
#<METADATA>SAMPLE=mouse1_tumor                  
contig  position    ref_count   alt_count   other_alt_count allele_frequency
1   3001278 1   0   0   0.111111
1   3001309 1   0   0   0.0277778
1   3001310 1   0   0   0.0277778
..........
........
............

However regardless of the vcf file the CalculateContamination contamination table contains only NaN and 1.0 . But segmentation tables are different - with strain specific file it contains no data, and with panmurine file it looks similar to normal result (but not quite):

#SAMPLE=mouse1_tumor
#<METADATA>SAMPLE=mouse1_tumor
contig  start   end minor_allele_fraction
11  3133205 120186319   0.11578047329359388
12  9024112 117728420   0.11578047329359388
13  5864694 119389285   0.14709163137980344
14  8368568 122944938   0.11578047329359388
15  4153960 103503606   0.11578047329359388
16  5009839 96161963    0.11578047329359388
17  5928381 94835879    0.11578047329359388
18  3015925 89073490    0.11578047329359388
19  3996832 61225661    0.11578047329359388
1   4611421 193194496   0.12379274849627589
2   4925592 181917652   0.11578047329359388
3   8936403 154765979   0.11578047329359388
4   3212345 156357539   0.1547816247108552
5   3236897 149625240   0.11578047329359388
6   4754265 149353876   0.11578047329359388
7   3841411 144450962   0.11578047329359388
X   7946841 151850725   0.11578047329359388
8   3621134 127592996   0.11578047329359388
9   7131733 123664738   0.11578047329359388
10  5035173 128723359   0.11578047329359388

and takes about 2.5 hours to execute, which however does not help getting right contamination and FilterMutectCalls results.

Murine commands are like this:

'/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'GetPileupSummaries' '-I' 'input.bam' '-V' '/media/transgeneprep/DataStore/References/mouse_GRCm38_ensembl_102/snp/biallelic.with_af.mgp.v5.merged.snps_all.dbSNP142.vcf.gz' '-L' '/media/transgeneprep/DataStore/References/mouse_GRCm38_ensembl_102/snp/biallelic.with_af.mgp.v5.merged.snps_all.dbSNP142.vcf.gz' -O '/home/transgeneprep/galaxy/database/objects/4/f/e/dataset_4fe4af55-a216-498d-bcc5-a26cbbd39d0d.dat' '--disable-sequence-dictionary-validation'
'/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'CalculateContamination' '-I' '/home/transgeneprep/galaxy/database/objects/0/4/f/dataset_04f08621-9adb-49c5-bec0-ecb21d3f87a2.dat' '-matched' '/home/transgeneprep/galaxy/database/objects/4/f/e/dataset_4fe4af55-a216-498d-bcc5-a26cbbd39d0d.dat' '--tumor-segmentation' '/home/transgeneprep/galaxy/database/objects/a/1/f/dataset_a1f596be-bd1c-4ee3-8a81-06ef6577e267.dat' '-O' '/home/transgeneprep/galaxy/database/objects/e/2/b/dataset_e2bdecb6-2c1b-454a-b81b-3d3f2d9d5269.dat'
'/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'FilterMutectCalls' '-R' '/media/transgeneprep/DataStore/References/mouse_GRCm38_ensembl_102//bwa/Mus_musculus.GRCm38.dna.primary_assembly.fa' '-V' 'input.vcf' '--contamination-table' '/home/transgeneprep/galaxy/database/objects/e/2/b/dataset_e2bdecb6-2c1b-454a-b81b-3d3f2d9d5269.dat' '--tumor-segmentation' '/home/transgeneprep/galaxy/database/objects/a/1/f/dataset_a1f596be-bd1c-4ee3-8a81-06ef6577e267.dat' '--orientation-bias-artifact-priors' '/home/transgeneprep/galaxy/database/objects/0/7/f/dataset_07f1c9fd-bd59-4c69-8ff5-75aad7abf42a.dat' '-O' '/home/transgeneprep/galaxy/database/objects/3/0/5/dataset_305ecc93-51f2-4168-9488-303d05dd9731.dat' 

Hominine commands are like this:

'/home/transgen/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'GetPileupSummaries' '-I' 'input.bam' '-V' '/home/transgen/galaxy/tools/melanoma_tools/genome/ALL.wgs.biallelic.exons.af.vcf.gz' '-L' '/home/transgen/galaxy/tools/melanoma_tools/genome/ALL.wgs.biallelic.exons.af.vcf.gz' -O '/home/transgen/galaxy/database/objects/c/9/c/dataset_c9c268ae-784b-469a-92f4-96bf2c0d8a5c.dat' '--disable-sequence-dictionary-validation'
'/home/transgen/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'CalculateContamination' '-I' '/home/transgen/galaxy/database/objects/c/1/d/dataset_c1d59b22-ab10-450c-adbf-2e26dcf59c29.dat' '-matched' '/home/transgen/galaxy/database/objects/c/9/c/dataset_c9c268ae-784b-469a-92f4-96bf2c0d8a5c.dat' '--tumor-segmentation' '/home/transgen/galaxy/database/objects/3/e/d/dataset_3edec44a-5833-45ca-a269-d00daf80a7b3.dat' '-O' '/home/transgen/galaxy/database/objects/4/1/0/dataset_41054645-14bf-460b-91a1-4a32196e1454.dat'
'/home/transgen/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'FilterMutectCalls' '-R' '/home/transgen/galaxy/tools/melanoma_tools/genome/hg38.analysisSet.fa' '-V' 'input.vcf' '--contamination-table' '/home/transgen/galaxy/database/objects/4/1/0/dataset_41054645-14bf-460b-91a1-4a32196e1454.dat' '--tumor-segmentation' '/home/transgen/galaxy/database/objects/3/e/d/dataset_3edec44a-5833-45ca-a269-d00daf80a7b3.dat' '--orientation-bias-artifact-priors' '/home/transgen/galaxy/database/objects/e/f/7/dataset_ef7eba53-add9-49e3-b95c-86a57e08b028.dat' '-O' '/home/transgen/galaxy/database/objects/2/7/c/dataset_27ce06ec-f75b-4d70-8f22-259e27ed881d.dat'

What am I doing wrong?

Thanks in advance.

gatk mouse • 1.4k views
ADD COMMENT
0
Entering edit mode
2.5 years ago
wormball ▴ 10

I filtered my vcf to contain only exons like it was in the human case, and now it magically works without errors, however the contamination table now says there is exactly 0 contamination:

sample  contamination   error
mouse1_e1t1_tumor   0.0 0.0

Segmentation:

#<METADATA>SAMPLE=mouse1_e1t1_tumor
contig  start   end minor_allele_fraction
11  3133205 98831362    0.11578047329359388
12  31928794    112604342   0.11578047329359388
13  21423608    115089981   0.11578047329359388
14  37722663    105892748   0.1591121893174376
15  41173680    101487948   0.11578047329359388
16  11073527    96161963    0.1457295541268666
17  14059189    69387485    0.14376941012509467
18  3027563 59032956    0.11578047329359388
19  9607415 61225661    0.11578047329359388
1   48882180    171358093   0.17082039324993695
2   41987597    181898448   0.11578047329359388
3   14974085    154765979   0.11578047329359388
4   8828239 156350622   0.19067737705075136
5   15472148    145875347   0.11578047329359388
6   18294560    135380297   0.14710943035635266
7   4937407 144420760   0.11578047329359388
X   66778504    124133726   0.2725560946587188
8   8667164 122894937   0.11578047329359388
9   3187774 123517966   0.11578047329359388
10  29535206    127193693   0.11578047329359388
ADD COMMENT

Login before adding your answer.

Traffic: 1766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6