Question: Publicly Available WGS Samples for Panel of Normals?
0
gravatar for tjbencomo
12 days ago by
tjbencomo10
tjbencomo10 wrote:

I have 15 tumors that were sequenced using whole genome sequencing at 30x depth. I would like to identify somatic variants using Mutect2. Unfortunately we don't have any normal samples, so I would like to build a Panel of Normals to use with Mutect2 in tumor-only to identify somatic variants. I also plan on running Mutect2 with gnomAD to help filter germline variants.

Are there any resources that host publicly available WGS samples I could use to construct the Panel of Normals? Ideally I'm looking for healthy blood samples sequenced using TruSeq library prep on the Illumina NovaSeq platform.

I've checked out the 1000 Genomes project but it appears the sequencing technology they used doesn't match my own (probably due to how long ago the project finished). Are there newer resources that would have WGS samples with similar technical properties as my samples?

Furthermore, even if you aren't aware of samples with those specific properties, what resources do people use for WGS Panel of Normal creation if they don't have in house samples?

sequencing snp wgs • 132 views
ADD COMMENTlink modified 4 days ago • written 12 days ago by tjbencomo10
1

You could use ExAC or gnomAD as a stand in for Panel of Normals. Also, there are other files you could use, such as the Mutect2-exome-panel.vcf for hg19 from this folder or the 1000g PoN file for hg38 from this folder.

ADD REPLYlink written 12 days ago by RamRS28k

It was my understanding that the PoN should consist of samples that were sequenced in a similar way to my own samples. I guess using the 1000g PoN could seem reasonable (although the assumption there is that their prep kit/sequencer is similar to mine - probably not a bad although not optimal assumption), but aren't exomes sequenced using rather different protocols genomes? Is it common practice to use PoNs generated from exome samples for WGS variant calling and vice versa?

ADD REPLYlink modified 11 days ago • written 11 days ago by tjbencomo10
1

I'm not really sure. Common sense dictates that platform/protocol matched normals should be the ones used to form the PoN. However, there must be some sort of middle ground between "it needs to be sequenced on the same kind of machine using the same protocol" and "it could have been sequenced anywhere". Maybe there is an acceptable difference in depth or acceptable tweaks in comparison parameters between mutation entries in tumor samples vs PoN samples. I'm a beginner at this too, so I'd consult other experts on the forum.

ADD REPLYlink written 11 days ago by RamRS28k

As alternative, you may use VarScan2 and remove variants from Gnomad. Not perfect, but better than some límited panel of normals.

ADD REPLYlink written 12 days ago by German.M.Demidov1.8k

I forget to mention I plan to run Mutect2 with a PoN and gnomAD to filter for germline variants. Would it be correct to say VarScan2 does the same thing as Mutect2 with the gnomAD VCF - filter germline variants using gnomAD?

ADD REPLYlink written 11 days ago by tjbencomo10

Not really, it is a totally different variant caller which we use to call tumor variants with absence of normal matched pair :) but well, then keep up to your plan!

ADD REPLYlink written 11 days ago by German.M.Demidov1.8k
1
gravatar for tjbencomo
4 days ago by
tjbencomo10
tjbencomo10 wrote:

I ended up using the hg38 1000 Genomes PON from the Broad linked to by @RamRS. Although the PON was most likely generated from samples sequenced using different prep protocols, one of the GATK developers says on the GATK forum that the PON is still useful to account for mapping artifacts.

Because most errors caught by the panel of normals are mapping artifacts these are still useful despite changes in sequencing technology. "1000g_pon.hg38.vcf" is an hg38 panel of normals for both exomes and whole genomes generated from 1000 Genomes Project samples. Finally, "af-only-gnomad.hg38.vcf" is a copy of the gnomAD VCF stripped of all unnecessary INFO fields. It is used for the -germline-resource argument.

ADD COMMENTlink written 4 days ago by tjbencomo10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1606 users visited in the last hour