Question: Exome Sequencing Depth/Target Considerations And Shared Controls
gravatar for Ryan D
7.8 years ago by
Ryan D3.3k
Ryan D3.3k wrote:

We are proposing to use Agilent SureSelect for whole exome sequencing of four cases with mutations identified in a single gene having 32 exons. Supposing that the SureSelect covers 30 of 32 exons, and they are 75% on target with 90% alignment, all four samples could be done at 100x coverage in a single Hi-Seq lane. The reason for doing so is to show that we can use a NGS method to identify variants normally covered by Sanger sequencing kids in CLIA labs at 10x the cost. If we can, we plan to extend this to an 80-gene panel.

In your considered opinions, is coverage with these parameters sufficient? Further, if we hope to show we can identify 2 or 3 of the 4 mutations in this gene using NGS, are there shared controls available which would be appropriate and available if we wanted to use this sequencing data? Or do you think batch differences between institutions/machines/DNA sample prep differences would instead warrant only using controls sequenced with the same Agilent kit and similar coverage on the same machines at our institution? Better to have solid experiment design before proceeding than have grant reviewers shred us for comparing apples to oranges.

Please let me know your experiences or any relevant publications or edit the above tags. And thanks.

exome next-gen • 3.4k views
ADD COMMENTlink written 7.8 years ago by Ryan D3.3k
gravatar for Alex Paciorkowski
7.8 years ago by
Rochester, NY USA
Alex Paciorkowski3.4k wrote:

Your coverage estimate sounds about right, and this experimental design is one that I bet is ongoing in several labs right now as NGS moves to replace Sanger methods in the clinical arena. How to validate whole exome sequencing and comparing head-to-head with Sanger is a hot topic. One issue -- as you've alluded to with your 2 uncovered exons -- is how to fill the gaps not covered by exome seq -- designing Sanger "band-aids" to cover these areas does feel like a bit of a nuisance.

As for controls, you will need mutation-negative controls to prove that variants introduced by NGS are recognized by your informatics pipeline and not carried through to final results. You will also need a blinded cohort of mutation-positives and mutation-negatives (unknowns) to prove you can identify them correctly. All of the NGS results will need to be redone by Sanger methods to show validity of NGS compared to "gold-standard" current methods.

I encourage you to consult with friends/colleagues in pure clinical labs to design the best validation techniques, following CLIA guidelines as much as possible.

I would argue all of the NGS work should be done in one institutions, as there are likely to be artifacts introduced that are lab and machine-specific.

A version-tracking workflow tool for your informatics such as Galaxy is a must, and your reviewers will thank you.

A recent review that covers some of these issues is here.

ADD COMMENTlink written 7.8 years ago by Alex Paciorkowski3.4k

I had not heard that term: Sanger Band-aids. If that is not already coined, it should be.

As I understand it, investigators using this same pipeline will often see a dozen individuals apparently homozygous for a SNP never before seen which then turns out to be a sequencing artifact.

Thanks also, Alex, for pointing me to Galaxy to track this. I use it for a number of other UCSC issues. It would be helpful to get some background using it for NGS workflows. It looks like one is posted here: . Any that are considered better?

ADD REPLYlink modified 7.8 years ago • written 7.8 years ago by Ryan D3.3k

I'm not sure how often spurious homozygous variants turn out to be artifact -- but it does happen. Best to filter your variants also through the >5400 exomes available through the NHLBI's exome variant server:

The public Galaxy page now includes a beta GATK install, and the nice folks at Galaxy are really really helpful at helping design custom workflows to meet your needs.

ADD REPLYlink modified 7.8 years ago • written 7.8 years ago by Alex Paciorkowski3.4k

Hi Alex, I did not see any tools available or in development that will let us query against the NHLBI exome variant server? Can you point me to any workflows that include this or suggest who in Galaxy would be the contact to get that implemented. It sounds like a great resource. But also the first time I've heard of it.

ADD REPLYlink written 7.8 years ago by Ryan D3.3k

Hi Ryan - The ESP5400 SNP data can be downloaded from EVS via their "downloads" page at

You can then use local scripts to query that data. EVS data are not integrated into Galaxy afaik, but if you want to email me off-line I can put you in touch with people who are helping our group design custom workflows in Galaxy.

ADD REPLYlink written 7.8 years ago by Alex Paciorkowski3.4k
gravatar for Ron128
7.8 years ago by
Ron12830 wrote:

I would think the computational part would be very very important in addition to the coverage as well. What pipeline do you plan to use? We faced the same issue while carrying out exome sequencing of cancer tumours. We narrowed it down to the pipeline used for variant calls, GATK in our case.

ADD COMMENTlink written 7.8 years ago by Ron12830

We will use BWA for alignment to produce BAM files for input to GATK v2, as outlined here:

ADD REPLYlink written 7.8 years ago by Ryan D3.3k

But GATK v3 is available!

ADD REPLYlink written 7.8 years ago by Alex Paciorkowski3.4k

Thanks for the heads up, Alex.

ADD REPLYlink written 7.8 years ago by Ryan D3.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1066 users visited in the last hour