Question

Whole exome seuqencing capture sequence

0

Entering edit mode

2.7 years ago

enho ▴ 40

Hi Everyone,

I am trying to solve a problem in my analysis and I want to make sure I understand the sequencing procedure that is happening for NGS.

Imagine we want to do a whole exome sequencing, the steps would be:

Take the DNA
Fragment the DNA
Add the adapters to the end of fragments
Use a cDNA based filter to only keep the coding region fragments
Sequence ...

So now here is my question, how do we know the cDNA sequence before the sequencing? Is it based on GRCh19/38?

For highly polymorphic regions, such as HLA or KIR regions, is the cDNA sequence based on reference genome? (and therefor it is possible some reads are not captured?)

Thanks,

Exome Sequencing Whole • 965 views

ADD COMMENT • link updated 2.7 years ago by benformatics 3.9k • written 2.7 years ago by enho ▴ 40

score 2 · Answer 1 · 2021-08-17

2

Entering edit mode

2.7 years ago

ATpoint 82k

The steps should exactly be what the prep kit you use instructs you to do. WES is super standard and established, you do not want to do any custom approaches. The kit manufacturer will provide you with the coordinates they use for capture. WES fragments the DNA and then uses affinity capture probes to capture the exome. The kit also determines whether HLA and company will even be included.

ADD COMMENT • link 2.7 years ago by ATpoint 82k

1

Entering edit mode

Thanks to ATpoint response, I went ahead and read the reference for a sample exome capture kit. It was mentioned there and I quote:

SureSelect human all exon V7 (exome V7) is a new exome kit that maximizes coverage for a given sequencing depth. Designed using GRCh38/hg38 genome assembly, the Exome V7 targets the protein coding regions documented in the latest versions of RefSeq, GENCODE, CCDS, and UCSC known genes, including hard-to-capture exons that are omitted from other commercial exome kits. Furthermore, Exome V7 targets all pathogenic variants in the genes included in the ACMG guidelines for secondary findings. A novel primer design algorithm results in an efficient design with a total size of only 48.2 Mbs.

So I guess this type of information should be available in the reference of each capture kit people use for their experiment, in case someone in the future is looking for the sequence assembly used for their capture kit.