Known-variant and AF-only files of GATK Mutect2 on Mouse DNA sequencing data
0
0
Entering edit mode
4 months ago
whb ▴ 20

I have some mouse C57 targeted panel sequencing data. I want to call somatic variants using GATK. Because of the cost, there are only 4 normal-tumour matched samples. and the rest 16 tumour samples have no matching normal.

Q1) Should I be using the latest assembly GRCm39 as the reference for bwa or GRCm38? My concern is the files needed in later steps might not be available for GRCm39. e.g. dbSNP availableon GRCm38 but cannot find any on GRCm39.

Q2) Should I process the tumour with normal samples differently in Mutect2? i.e. tumour with matched normal mode for the 4 T-N matched samples and tumour only mode for the remaining 16 samples?

Q3) I have trouble finding these two files:

1) --known-sites sites_of_variation.vcf \ for BaseRecalibrator 2)--germline-resource af-only-gnomad.vcf.gz \ for Mutect2 and PON

I found 2 links for --known-sites sites_of_variation.vcf \.

ftp://ftp-mouse.sanger.ac.uk/REL-1303-SNPs_Indels-GRCm38/

Do I need to prepare the files as per: genomics/gatk-mouse-mm10.md at master · igordot/genomics · GitHub. It is taking hours to download one file and NCBI connection keeps dropping...

I have also found the following vcf files. Are the two below suitable to use as --known-sites sites_of_variation.vcf \? Whats the difference? and which one should be used?

Sanger REL-1505 mouse strain specific vcf:

Should I be using both C57...SNPs.vcf and C57...indels.vcfas input for --known-sites sites_of_variation.vcf \?

EBI GRCm38 vcf:

Lastly, I cannot find anything on the Mutect2 required --germline-resource af-only-gnomad.vcf.gz \ Could you help please?

Sorry for the million questions and thank you in advance!

gatk mutect mouse SNP baserecalibrator • 279 views