Question: Using GATK4 Mutect2 on mouse data, need a genome snp reference
1
gravatar for jc.szamosi
13 months ago by
jc.szamosi50
Canada
jc.szamosi50 wrote:

I am using the Mutect2 program from GATK4 to call somatic snps in mouse cancer cells, and I want to use the Sanger Mouse genome project's strain-specific vcf file (ftp://ftp-mouse.sanger.ac.uk/) for the --germline-resource argument, but this argument requires that the vcf file have a POP_AF INFO tag, which the Sanger vcf file does not. Is there a similar germline snp file that I could use that includes this information?

mouse snp mutect2 gatk vcf • 1.1k views
ADD COMMENTlink written 13 months ago by jc.szamosi50
2

Maybe this is of assistance:

https://github.com/igordot/genomics/blob/master/workflows/gatk-mouse-mm10.md

ADD REPLYlink written 13 months ago by h.mon27k

The link to the NCBI vcfs in that tutorial is broken, but I'll see if I can find one that works and edit this comment.

Edit: The NCBI vcf files are here: ftp://ftp.ncbi.nih.gov/snp/organisms/archive/mouse_10090/VCF/, but they also don't have POP_AF INFO tags, so they won't work for this purpose. Many thanks, however.

ADD REPLYlink modified 13 months ago • written 13 months ago by jc.szamosi50
1

The human population frequencies come from large-scale studies with thousands of samples. I am not sure anything like that exists for other species. There is obviously a lot of mouse sequencing data available, but I don't think there is any organized version.

ADD REPLYlink written 13 months ago by igor8.6k

Thanks! Do you know of a way to make the snp file work with GATK4 if the POP_AF tag is absent?

ADD REPLYlink written 13 months ago by jc.szamosi50
1

Technically, that is an optional parameter, so you could skip it.

(iii) Mutect2 also differs from the HaplotypeCaller in that it can apply various prefilters to sites and variants depending on the use of a matched normal (--normal-sample), a panel of normals (PoN; --panel-of-normals) and/or a common population variant resource containing allele-specific frequencies (--germline-resource). If provided, Mutect2 uses the PoN to filter sites and the germline resource and matched normal to filter alleles.

If you cannot find the populations frequencies, then there is not much you can do.

ADD REPLYlink modified 13 months ago • written 13 months ago by igor8.6k

I can skip it, and that's what I've done for now. For technical reasons, my tumor and normal samples need to come from different individuals. I've created a PON from all my normal individuals, but I was hoping for a strain vcf so that I can try to distinguish between among-individual variation in the germline, and actual somatic mutations.

I suppose I could add a fake POP_AF tag to the strain vcf.... I'll have to read more about how that tag is used first, though.

ADD REPLYlink written 13 months ago by jc.szamosi50

if your goal is to discard germline variants, I would suggest to annotate the Mutect2 output for dbsnp (or file of your interest) using a tool like snpSift and then filter them out.

However, you can always go back to GATK3 in which parameter dbsnp is active

ADD REPLYlink written 10 months ago by 2nelly170

I'm using GATK4to call mouse tumor variants,too. For the same reason, I skip the germline resource. But I think this may be what we needenter link description here. You can find this in README:

If available for this species, the file includes information on: - ancestral_allele - evidence - clinical_significance - global minor allele, frequency and count

Besides, I would like to know which file you used for the -v in GetPileupSummaries. It seems that GetPileupSummaries also needs a vcf file to have MAF tag.

ADD REPLYlink written 7 months ago by d5016227310

Hi, i am facing the same problem you have too as i am analzsing tumor samples for b6 mice and don't have matched normals for all my samples. Could you find a germline resource or PON compatible with b6 mice? Also as i understood the --dpsnp option is useless in GATK4 so i couldn't find a way so far to make use of the sanger 137snp.vcf file. Did you find a way around it or what did you do for your samples in the end?

ADD REPLYlink written 6 months ago by tarekzakaria.badr30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 786 users visited in the last hour