GATK Mutect2 Panel-Of-Normals generation
1
1
Entering edit mode
3.8 years ago
cocchi.e89 ▴ 260

I am trying to generate PON for WES samples following GATK recommendations, they also have another explanation in this Mutect2 article but it's basically the same identical 3-steps procedure:

step 1. Run Mutect2 in tumor-only mode for each normal sample: gatk Mutect2 -R reference.fasta -I normal1.bam -max-mnp-distance 0 -O normal1.vcf.gz

step 2. Create a GenomicsDB from the normal Mutect2 calls: gatk GenomicsDBImport -R reference.fasta -L intervals.interval_list --genomicsdb-workspace-path pon_db -V normal1.vcf.gz -V normal2.vcf.gz -V normal3.vcf.gz -V ...

step 3. Combine the normal calls using CreateSomaticPanelOfNormals: gatk CreateSomaticPanelOfNormals -R reference.fasta --germline-resource af-only-gnomad.vcf.gz -V gendb://pon_db -O pon.vcf.gz

I am using gatk 4.1.7 (latest at the moment) but the output I got from step 2 (GenomicsDBImport) is a folder with some files in it, such as vcfheader.vcf, vidmap.json and what looks like a file for every chromosme with a $ and contig boundaries specified in the BED file (e.g. X$200786$155255277).

If I try to pass this directory in the -V option of CreateSomaticPanelOfNormals (step 3 ) I got an error that the specified input is not a regular file, and GATK documentation confirms that -V is supposed to be a VCF file.

Does anybody, that maybe has generate PONs before or worked with this, knows what is the exact file output from step 2 that I am supposed to pass in step 3 -V?

Thank you very much in advance for any help!

gatk mutect2 panel-of-normal wes non-tumor • 5.9k views
ADD COMMENT
0
Entering edit mode

Can you not use the PON files they make available here?

ADD REPLY
0
Entering edit mode

No my samples are hg19, moreover I’d like to be sure that the PON comes from samples samples with the same kit

ADD REPLY
0
Entering edit mode

I am absolutely new to bioinformatics and I'm seeking solution for whole-exome somatic variant calling...

gatk GenomicsDBImport -R reference.fasta -L intervals.interval_list --genomicsdb-workspace-path pon_db -V normal1.vcf.gz -V normal2.vcf.gz -V normal3.vcf.gz -V ...

This command specifically.....Is it necessary to provide interval or interval-list here since I'm looking at whole-exome??? I forgot to mention - I'm using gatk 4.2.0.0 version....

ADD REPLY
1
Entering edit mode

Yes, because GATK has no way to know which targets were captured in your exome assay. Every kit out there is slightly different and may be based on specific genome builds. Your kit manufacture should have this file already available.

ADD REPLY
0
Entering edit mode

Thanks a lot for such a prompt response. I don't think I have that list.. all I know is - its a whole exome - hg38 is what I have mapped it with.. I'm not sure if I can just provide a list of all the possible chromosomes - something like: chr1 chr2 chr3 chr4 . . . chrX chrY

ADD REPLY
1
Entering edit mode

If you targeted just exome then you can't provide entire genome. Can you check what kit was used for preparing your samples? If you don't have that information you should check with whoever prepared the samples to get that information.

If that does not work then Broad Institute makes a generic interval list available for GRCh38 here. You could use it with caveat that the list may not match your data 100%.

ADD REPLY
0
Entering edit mode

Thanks!... Sure will check that...

ADD REPLY
1
Entering edit mode
3.8 years ago

I've done this quite recently, with what I hope is the latest version.
In my case I had to do this as below:

-V gendb://pon_db

the gendb:// shouldn't be changed, and what follows is the path to the directory created by GenomicsDBImport

ADD COMMENT
0
Entering edit mode

Looks like it did the trick. Thank you very much @WouterDeCoster !

ADD REPLY

Login before adding your answer.

Traffic: 2427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6