Question: Reference and dbSNP incompatibility issue (MuTect2)
0
gravatar for umn_bist
3.9 years ago by
umn_bist350
umn_bist350 wrote:

When I try using MuTect2 (from GATK) I get this error

Is there a link to an (old) dbSNP that is compatible with UCSC's hg19 assembly?

EDIT: I cannot post the error message because Biostar is saying that it isn't in English…I used the dbSNP from NCBI ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/

  00-All.vcf.gz

and I am using ucsc.hg19.fasta reference assembly

##### ERROR   dbsnp contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1, GL000194.1, GL000225.1, GL000192.1, NC_007605]
##### ERROR   reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]
ADD COMMENTlink modified 3.9 years ago by Chris Miller21k • written 3.9 years ago by umn_bist350
1

Hi,

Just one addition to what Chris has already said. There is difference in the mito. sequence in the UCSC version as compared to the b37/ 1000G/ Ensembl ver. So if you stick to 1-22 & X and Y only then replacing/ prefixing 'chr' is Ok.

Else take care of the mito. data. And also the alternate/ unplaced contigs. Those are also different in the UCSC ver.

When I analyze WES data, since its (Agilent) not designed to capture mito. anyways, I just choose 1-22, X and Y. Then the data/ sequence of UCSC is interchangeable smoothly with b37/ 1000G

ADD REPLYlink written 3.9 years ago by Amitm1.7k
3
gravatar for Chris Miller
3.9 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

This is the same as your previous problems.  You'll either need to change the dbSNP file or change your data and reference fasta.  The former is probably easier - you'll just need to add "chr" when appropriate, change "MT" to "chrM", and convert between the gl contig names

ADD COMMENTlink written 3.9 years ago by Chris Miller21k
2

There is now a separate dbSNP download section with "corrected" contig names: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/GATK/

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by igor8.9k

This is pretty useful. THX

ADD REPLYlink written 3.1 years ago by Mdeng510

Thanks for your help, Chris. Yes, this has all been little validation errors due to the main issue of not having the original reference.

I did however get a hold of a working reference genome (ucsc.hg19), its corresponding dbSNP and COSMIC vcf but having gone through the formatting process (sorting, indexing, add read group) and finally getting a vcf file with no mutation detection, I think I will resort to the second best option. Do you have any recommendations other than Mutect2 if I am trying to resort to a single tool? FreeBayes/VarScan2/SomaticSniper? GATK has been a very difficult, time consuming (and eye-opening) experience thus far. Thanks again for your help.

EDIT: I find samtools mpileup function much more comfortable to use (but it seems that it is horrible for somatic variant calling).

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by umn_bist350
1
gravatar for Chris Miller
3.9 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:
If you're only going to run one variant caller, Mutect is probably the way to go
ADD COMMENTlink written 3.9 years ago by Chris Miller21k

Does this stand even if I have (impure) tumor samples with no matching normals? I read that MuTect2 is great for pure tumor samples because it picks up low VAF % but for impure ones, it can be too sensitive (high false positives). Does the fact that I have dbSNP and COSMIC vcf ensure that MuTect is good for my use case? Thank you for your help.

ADD REPLYlink written 3.9 years ago by umn_bist350

No variant caller that I've seen yet is great at low-VAF calling. Impure tumors are more difficult, because the signal is depressed and closer to the noise level from the error rate of the sequencer/prep. If you push too far down, you begin picking those up get a huge number of false positives. My preference is always for some sort of ensemble calling, followed by filtering, but if you're going to use one caller, I still think that Mutect is a reasonable way to go here.

ADD REPLYlink written 3.9 years ago by Chris Miller21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1738 users visited in the last hour