Question

Forum:Sanger sequencing is no longer the gold standard?

4

Entering edit mode

6.5 years ago

lamteva.vera ▴ 220

Sanger sequencing is no longer the gold standard, ... because there is potential for allele dropout due to polymorphic positions under primers or unknown heterozygous deletions. When this happens, the sequencing may either miss variants or may erroneously assign homozygosity to a heterozygous/hemizygous variant. What's more, Sanger sequencing can only detect a minimum allele frequency of 15 percent to 20 percent.

says Josh Deignan, PhD, associate director of the UCLA molecular diagnostics laboratories.

Do you use Sanger confirmation in your lab? In which cases is it acceptable to skip Sanger sequencing?

Sanger NGS • 4.0k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 6.5 years ago by lamteva.vera ▴ 220

4

Entering edit mode

is it acceptable to skip Sanger sequencing?

Not until the regulatory agency in {insert your country name here} says you can (if one deals with human diagnostic samples).

ADD REPLY • link 6.5 years ago by GenoMax 142k

score 12 · Answer 1 · 2017-11-02

12

Entering edit mode

6.5 years ago

Kevin Blighe 87k

I worked in the National Health Service in England where my role specifically was to install an automated NGS pipeline that could match Sanger and that adhered to all UK- and EU-based regulations. We did it, after much hard work. That said, I have major concerns about NGS actually replacing Sanger, and also in ever calling NGS the 'gold standard'.

My main concerns surround:

Inefficient sequencing technology. MiSeq, Ion Torrent, etc. have uncomfortably high error rates, including the incorporation of incorrect bases during sequencing and errors in base calling.
'Buggy' analysis tools, where fixes and patches are applied seemingly on a monthly basis
Lack of standardisation: When was the last time you encountered a corrupted VCF because someone decided to produce a 'variant' of the VCF format (pun intended)?; also, remember the MAPQ debate, i.e., where each tool has its own interpretation of what MAPQ actually is?

All that said, with NGS, there is strength in confirming results via multiple runs of the same sample on different platforms or by downsampling a single sample and recalling variants on the subsamples (and then finding consensus variants).

If I was head of a clinical genetics laboratory, I would never ditch Sanger completely. As I know a few heads of laboratories in both the UK and USA, I know that they are also in agreement.

ADD COMMENT • link 6.5 years ago by Kevin Blighe 87k

1

Entering edit mode

Thanks for the reply, Kevin! The conception of "downsampling a single sample and recalling variants on the subsamples (and then finding consensus variants)" is new to me, could you please provide some details? Update: ah, I see now, this is some unique part of your pipeline.

ADD REPLY • link 6.5 years ago by lamteva.vera ▴ 220

1

Entering edit mode

Hi lamteva.vera,

It really just involves some high quality QC and then, on the final BAM, utilising PIcard DownsampleSam in order to extract multiple subsets of 'random' reads at frequencies of 25%, 50%, and 75% (or even more, if you wish), and then calling variants independently on these. At the end, you get the consensus list.

I followed the development of the GATK 'best practices' over years and they continued introducing steps that just made the analysis more and more complex and still never getting to the truth. The one thing that they never implemented was this simple random read selection step.

All variant callers that I've used suffer from false-positives, even when the variant has a high frequency in the aligned reads. This method overcomes this issue.

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

would love to see the code for your pipeline, if available

ADD REPLY • link 6.5 years ago by steve ★ 3.5k

3

Entering edit mode

Hey steve,

Yes, apologies, we attempted to publish the work years ago in Genetics in Medicine (journal of the American College of Medical Genetics and Genomics I believe), but as we were a laboratory in the English health service, there is no set funding for research. We got reviewers' comments back but then ran out of time to do further work. In a clinical laboratory, as you can understand, the priority is just getting patient results back. Every week there's seemingly an emergency.

The only unique part of the pipeline, which I've since seen repeated in at least one other laboratory, is to create multiple subsets of each aligned BAM and to call variants independently on these [subsets]. At the end, you then get the consensus list of variants. These subsets just contain a random selection of reads, for example, extract 3 subsets containing a 75%, 50%, and 25% random selection of reads. What we found in the laboratory is that some Sanger-confirmed variants won't be called in the original BAM, but may be called in one of the subsets.

That was a few years ago, so perhaps methods of variant calling have since overcome these issues. We didn't have funding to do much more.

Disclaimer: I don't believe in true randomness ;)

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

1

Entering edit mode

The subsetting is quite interesting. Did you come up with any theories regarding why higher coverage was masking (presumably true) variants?

ADD REPLY • link 6.5 years ago by Devon Ryan 104k

2

Entering edit mode

Hey Devon, yes, it's as if they 'fall out' of the probability window of being detected, and that this is heavily governed by the overall read-depth - this is only applicable to germline variants though. We looked at many of these 'missed' variants on IGV and they were clear variant calls, and even Sanger-confirmed. We used default GATK settings back then, and ignored anything called around a homopolymer.

SAMtools improved the situation on SNVs but we still noticed some drop-outs. I quickly moved onto RNA- and ChIP-seq back then, so I never looked further.

If you have time, I'd encourage you to research this further!

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, would it be possible to upload the unpublished manuscript to some preprint server like bioRxiv, arXiv, or PeerJ? So the research community can still benefit from your findings - and they are free to use.

I would understand if the clinical setting makes it difficult.

Thanks for considering it :)

ADD REPLY • link 5.5 years ago by jena ▴ 290

1

Entering edit mode

Hey, I originally sent it to Genetics in Medicine back in 2014. They sent it for peer review and we received comments in return. They were going to publish it but, as an already stretched clinical genetics laboratory, we had no time to do any further work on it. Back then, NHS labs in England were in survival mode and fearing closure. Staff had to leave the lab to save money but nobody really wanted to voluntarily leave - it became difficult, as you can imagine.

We got our message out through conferences - me presenting in Manchester, Dundee, and in the local Sheffield area. The work always generated interest.

In the end, the lab has survived and still uses the same pipeline that I installed. I retired a modified version of the code to Github: https://github.com/kevinblighe/ClinicalGradeDNAseq

The key part is from line 156 in the AnalysisMasterVersion1.sh file

Keep in mind that I wrote the bare bones of that when I was much less experienced and had just stepped out of academia into the real World, in what was a baptism of fire, seemingly. I would structure it differently were it written today.

Edit: I realise just now that I had written most of this above in my other comment!

ADD REPLY • link 5.5 years ago by Kevin Blighe 87k

score 3 · Answer 2 · 2017-11-02

Sanger sequencing is still the de facto method for us in the lab when it comes to confirming cloning constructs, where you may only need a kilo base of sequence or so. Costs about £2 a sample.

For anything more complicated, I’d say the gold standard by now would be Illumina (usually a MiSeq), combined with a long read technology, either PacBio or increasingly Nanopore.

score 1 · Answer 3 · 2017-11-06

1

Entering edit mode

6.5 years ago

lamteva.vera ▴ 220

To everybody interested in the topic: have a look at this article.

ADD COMMENT • link 6.5 years ago by lamteva.vera ▴ 220

0

Entering edit mode

Thanks very much for that - I will read it.

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

score 1 · Answer 4 · 2017-11-06

See the white paper below from Invitae on their confirmation methods. They must be doing something right as they have doubled their revenues during the last quarter: https://www.genomeweb.com/molecular-diagnostics/invitae-more-doubles-q3-revenues

They talk about the orthogonal methods they use to confirm "messy" NGS variants. Basically it looks like their orthogonal methods are Sanger, Pacbio long read seq, aCGH, and MLPA.

Invitae NGS Confirmation White Paper

The interesting thing here is their mention of PacBio. Doing a quick "PacBio + Invitae" google search you can find that their variant confirmation pipeline is called SMRTer Confirmation and they have a poster explaining the pipeline:

https://marketing.invitae.com/acton/attachment/7098/f-04c6/1/-/-/-/-/Invitae_2016_ASHG_Confirmation_McCalmon.pdf

"We have demonstrated that the use of PacBio’s platform is able to meet both demands by incorporating high quality data into an automated confirmation pipeline for SNV’s and small indels. We have demonstrated that PacBio is not only equivalent, but superior to Sanger sequencing for confirmation purposes through the analysis of a feasibility data set of 730 amplicons containing 252 unique patient variants (96.4% vs. 81.7%). This approach was rigorously validated with a unique variant set of 30 distinct SNV’s and indels representing a wide range of sequence contexts and sizes, in which we demonstrated 100% accuracy with Invitae’s PacBio-driven “SMRTer-Confirmation” pipeline across three independent sequencing runs. As the volume of samples increases, we will be able to keep pace with the demands of the confirmation burden with minimal addi/onal capital equipment costs or hand-‐on labor needs due to the enhanced scalability of the multiplexed PacBio workflow. Further cost and me savings will be possible through the ability to multiplex up to 384 amplicons per pool."