Confusion with somatic variant calling: Advice on Ensembl-compatible resources?
0
0
Entering edit mode
23 days ago
DGTool • 0

Hi,

I am currently trying to do somatic variant calling for some tumor samples, but have been running into errors / confused about some resources to use.

My reads were aligned to Ensembl's GRCh38 reference genome. I am using GATK's Mutect2 for the somatic variant calling. As recommended on their site, I had used GATK's provided 1000g_pon.hg38.vcf. and af-only-gnomad files for the --pon and --germline-resource arguments respectively. Issue here, is that because my reads were aligned to Ensembl's genome, the ##contig headers of the PON/gnomAD do not match the input BAM files (GATK errors out, related to reference and feature contigs not matching).

I've been looking if there are any Ensembl-compatible files, and so far I have only found that Ensembl does provide a 1000GENOME_phase3.vcf (from here), which to my guess would be used for the --pon parameter? I haven't been able to find an alternative for the germline resource though, and was looking into maybe NCBI's common_all.vcf which mentions being a resource for common germline variants, would it work for that? At least Mutect2's Manual says any VCF containing the "AF" INFO field is valid - though I don't know if there is an alternative to the gnomAD one that is commonly used. Alternatively, it would also be possible to rename all the contig names from the 1000g_pon.hg38.vcf, but I don't know if that would be a bit more troublesome / lead to errors.

I guess I am somewhat confused as what files are available / can be used for each of those parameters. I've looked at some previous posts, but haven't been able to find a concrete answer of what is the ideal way.


I have also found this on Ensembl's FTP which seems to have per-chromosome gnomAD files; but no "combined" file similar to the one included in GATK's resource bundle. The file size is also vastly different (GATK's af_only_gnomad ~3GB total, whereas Ensembl ~1-2GB each chromosome). Another question that would pop up is whether to use gnomAD's exomes or genomes files? It seems that gnomADv3 doesn't have exome-based data available.

Any help is appreciated!

GATK Mutect2 ensembl • 125 views
ADD COMMENT

Login before adding your answer.

Traffic: 1782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6