Variant calling in tumor/normal pairs is calling heterozygous positions
0
0
Entering edit mode
7 weeks ago

I ran two somatic variant calling programs (mutect and strelka) on some tumor/normal pairs. In parallel, I also did RNA seq analysis on the same tissue. As I looked at the results closely I found that a position such as the following was called a variant at the DNA level:

chr11:62529120 C->A

Yet, at the RNA level:

  • Normal: 351C, 446A
  • Tumor: 61C, 110A

So it seems to me that this is nothing more than a heterozygous position yet the variant calling programs are calling this a variant. Why is this and how can I filter them out? I am just interested in variants that are unique for the tumor!

Command line arguments for Mutect and FilterMutectCalls:

##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2 --f1r2-tar-gz PSC001_TUMOR_vs_PSC001_NORMAL.mutect2.chr11_51078349-54425074.f1r2.tar.gz
--normal-sample PSC001_PSC001_NORMAL --panel-of-normals /gpfs1/home/r/b/rbarrant/projects/moonshot/work/a0/51712651714bb37dee782abaec2fd1/1000g_pon.hg38.vcf.gz
--germline-resource /gpfs1/home/r/b/rbarrant/projects/moonshot/work/a0/51712651714bb37dee782abaec2fd1/af-only-gnomad.hg38.vcf.gz
--output PSC001_TUMOR_vs_PSC001_NORMAL.mutect2.chr11_51078349-54425074.vcf.gz
--intervals chr11_51078349-54425074.bed --input PSC001_NORMAL.recal.cram --input PSC001_TUMOR.recal.cram --reference Homo_sapiens_assembly38.fasta --tmp-dir . --f1r2-median-mq 50
--f1r2-min-bq 20 --f1r2-max-depth 200 --flow-likelihood-parallel-threads 0 --flow-likelihood-optimized-comp false --trim-to-haplotype true --exact-matching false
--flow-use-t0-tag false --flow-probability-threshold 0.003 --flow-remove-non-single-base-pair-indels false --flow-remove-one-zero-probs false --flow-quantization-bins 121 --flow-fill-empty-bins-value 0.001 --flow-symmetric-indel-probs false --flow-report-insertion-or-deletion false --flow-disallow-probs-larger-than-call false --flow-lump-probs false --flow-retain-max-n-probs-base-format false --flow-probability-scaling-factor 10 --flow-order-cycle-length 4 --keep-boundary-flows false --genotype-pon-sites false --genotype-germline-sites false --af-of-alleles-not-in-resource -1.0 --mitochondria-mode false --mutect3-training-mode false --mutect3-ref-downsample 10 --mutect3-alt-downsample 20 --mutect3-non-artifact-ratio 20 --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qual 40 --base-qual-correction-factor 5 --max-population-af 0.01 --downsampling-stride 1 --callable-depth 10 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvcf-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --minimum-allele-fraction 0.0 --independent-mates false --flow-mode NONE --disable-adaptive-pruning false --kmer-size 10 --kmer-size 25
--dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --pruning-seeding-lod-threshold 9.210340371976184 --max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-artificial-haplotype-recovery false --enable-legacy-graph-cycle-detection false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --num-matching-bases-in-dangling-end-to-recover -1
--error-correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --dragstr-het-hom-ratio 2 --dont-use-dragstr-pair-hmm-scores false
--pair-hmm-gap-continuation-penalty 10 --expected-mismatch-rate-for-read-disqualification 0.02 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45
--disable-symmetric-hmm-normalizing false --disable-cap-base-qualities-to-map-quality false --enable-dynamic-read-disqualification-for-genotyping false --dynamic-read-disqualification-threshold 1.0 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --flow-hmm-engine-min-indel-adjust 6
--flow-hmm-engine-flat-insertion-penatly 45 --flow-hmm-engine-flat-deletion-penatly 45 --pileup-detection false --use-pdhmm false --use-pdhmm-overlap-optimization false --make-determined-haps-from-pd-code false --print-pileupcalling-status false --fallback-gga-if-pdhmm-fails true
--pileup-detection-enable-indel-pileup-calling false --pileup-detection-active-region-phred-threshold 0.0 --num-artificial-haplotypes-to-add-per-allele 5 --artifical-haplotype-filtering-kmer-size 10 --pileup-detection-snp-alt-threshold 0.1 --pileup-detection-indel-alt-threshold 0.1 --pileup-detection-absolute-alt-depth 0.0 --pileup-detection-snp-adjacent-to-assembled-indel-range 5 --pileup-detection-snp-basequality-filter 12 --pileup-detection-bad-read-tolerance 0.0 --pileup-detection-proper-pair-read-badness true --pileup-detection-edit-distance-read-badness-threshold 0.08 --pileup-detection-chimeric-read-badness true --pileup-detection-template-mean-badness-threshold 0.0 --pileup-detection-template-std-badness-threshold 0.0 --pileup-detection-filter-assembly-alt-bad-read-tolerance 0.0 --pileup-detection-edit-distance-read-badness-for-assembly-filtering-threshold 0.12 --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --override-fragment-softclip-check false
--min-base-quality-score 10 --smith-waterman FASTEST_AVAILABLE --emit-ref-confidence NONE --max-mnp-distance 1 --force-call-filtered-alleles false --reference-model-deletion-quality 30 --soft-clip-low-quality-ends false
--allele-informative-reads-overlap-margin 2 --smith-waterman-dangling-end-match-value 25 --smith-waterman-dangling-end-mismatch-penalty -50 --smith-waterman-dangling-end-gap-open-penalty -110 --smith-waterman-dangling-end-gap-extend-penalty -6 --smith-waterman-haplotype-to-reference-match-value 200 --smith-waterman-haplotype-to-reference-mismatch-penalty -150 --smith-waterman-haplotype-to-reference-gap-open-penalty -260 --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 --smith-waterman-read-to-haplotype-match-value 10 --smith-waterman-read-to-haplotype-mismatch-penalty -15 --smith-waterman-read-to-haplotype-gap-open-penalty -30 --smith-waterman-read-to-haplotype-gap-extend-penalty -5 --flow-assembly-collapse-hmer-size 0 --flow-assembly-collapse-partial-mode false --flow-filter-alleles false --flow-filter-alleles-qual-threshold 30.0
--flow-filter-alleles-sor-threshold 3.0 --flow-filter-lone-alleles false --flow-filter-alleles-debug-graphs false
--min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100
--padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-extension-into-assembly-region-padding-legacy 25 --max-reads-per-alignment-start 50 --enable-legacy-assembly-region-trimming false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0
--interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1
--disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false
--gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false --max-read-length 2147483647 --min-read-length 30 --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.5.0.0",Date="February 15, 2025 at 8:26:12 AM GMT">

##GATKCommandLine=<ID=FilterMutectCalls,CommandLine="FilterMutectCalls --output PSC001_TUMOR_vs_PSC001_NORMAL.mutect2.filtered.vcf.gz --contamination-table PSC001_TUMOR_vs_PSC001_NORMAL.mutect2.contamination.table
--tumor-segmentation PSC001_TUMOR_vs_PSC001_NORMAL.mutect2.segmentation.table
--orientation-bias-artifact-priors PSC001_TUMOR_vs_PSC001_NORMAL.mutect2.artifactprior.tar.gz --variant PSC001_TUMOR_vs_PSC001_NORMAL.mutect2.vcf.gz --reference Homo_sapiens_assembly38.fasta --tmp-dir . --threshold-strategy OPTIMAL_F_SCORE --f-score-beta 1.0 --false-discovery-rate 0.05
--initial-threshold 0.1 --mitochondria-mode false --microbial-mode false --max-events-in-region 2 --max-alt-allele-count 1
--unique-alt-read-count 0 --min-median-mapping-quality -1 --min-median-base-quality 20 --max-median-fragment-length-difference 10000 --min-median-read-position 1 --max-n-ratio Infinity
--min-reads-per-strand 0 --min-allele-fraction 0.0 --contamination-estimate 0.0 --log-snv-prior -13.815510557964275 --log-indel-prior -16.11809565095832 --log-artifact-prior -2.302585092994046 --normal-p-value-threshold 0.001 --min-slippage-length 8 --pcr-slippage-rate 0.1 --distance-on-haplotype 100 --long-indel-length 5 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0
--interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1
--disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false
--gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false",Version="4.5.0.0",Date="February 15, 2025 at 1:12:56 PM GMT">
variant • 442 views
ADD COMMENT
0
Entering edit mode

Of course you could tweak some parameters to reject this variant. Share other details, like:

  • Mutect2 params
  • Experimental design. Tumor histology? FFPE or fresh? Which sample for control?
  • Pileup results from DNA. Be sure to exclude non-primary and duplicate reads, and enforce some quality threshold (like MAPQ > 20)
ADD REPLY
0
Entering edit mode

Thank you for your reply. For mutect2 I am using the default parameters (I am using the nf-core sarek pipeline and I am also using strelka). The design is just tumor/normal pairs from the same patient. The tumor is FFPE and the normal is blood. We don't have a control sample. I do have the pileup results for DNA with the aspects that you mentioned. All that said, I want the variants that are exclusive for the tumor but I seem to be finding a lot of heterozygous positions which are common for both normal and tumor. Any suggestions?

ADD REPLY
0
Entering edit mode

Would be useful to share the command line arguments used for mutect2 and FilterMutectCalls. You could find them in the header of the VCF. Share the pileup results from DNA like you did with the RNA.

ADD REPLY
0
Entering edit mode

Thank you. Pileup:

chr11   62529120        N       42      aCacaccCCaAAaccACaACaAaaCcACacCccaCCCaaAaaBeA@=?@ECAfBBDDDh`eEBCBAEE6eBECEEBeCABC@DC

and I place the command line arguments for FilterMutectCalls and mutect2 on the main question.

ADD REPLY
0
Entering edit mode

This is quite strange. Are you performing the variant calling using a small interval (ex. subset of chr11)?

ADD REPLY

Login before adding your answer.

Traffic: 3235 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6