Question: Where can I download Clinical or Medical Exome?
2
gravatar for Karma
18 months ago by
Karma200
India
Karma200 wrote:

I have read about the clinical exome from some research articles here, here and here. They tell about approximately 3000 genes in the clinical exome. Are there any databases from which I can download whole clinical exome data. Like gene name, location, variants and associated disease.

Thank You

ADD COMMENTlink modified 18 months ago by ptinto190 • written 18 months ago by Karma200
2

I feel that the term 'clinical exome' is very misleading and very ambiguous.

Clinical Exome Sequencing (CES) is merely exome sequencing, i.e., the sequencing of protein-coding genes. It is very misleading to call it 'clinical' because it has been shown time and time again that even intergenic mutations can play key roles in disease, even fully explain a disease mechanism in some cases.

See UCLA's definition here: http://pathology.ucla.edu/clinical-exome-sequencing

ADD REPLYlink modified 18 months ago • written 18 months ago by Kevin Blighe41k

I was wondering if CES is somehow a subset of WES. Not sure how anyone got the 3000 number. Ideally, one would go (by how confident one is) fro a targeted gene capture -> whole exome -> whole genome. Maybe the targeted part is what's being called the CES? In which case, CES would refer to the set of all genes that could contribute to a specific phenotype, and could be used on a slightly large scale for personalized genomics.

ADD REPLYlink written 18 months ago by RamRS21k
1

I think of CES as exome sequencing done for clinical research/diagnostic purpose. I may do only 20 genes and call it a clinical exome, if my interests are limited to those genes.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax65k

My thoughts exactly.

ADD REPLYlink written 18 months ago by RamRS21k

Yes, but in many cases, studies just show an association between a variant in a given exon and a disease, i.e., there's no concrete proof.

ADD REPLYlink modified 18 months ago • written 18 months ago by Kevin Blighe41k

Just for the OP to expand on this. It really does not appear to be any different from standard exome sequencing, although one must be aware, in this sense, that different exome-seq capture kits target different regions and are thus sequencing different numbers of genes.

For the CES, which is validated by CLIA according to the authors in their publish manuscript HERE, they state:

Exome capture was performed using SureSelect Human All Exon V2 Kit (Agilent Technologies) and sequencing was performed using the HiSeq 2000 for a 50-bp paired-end run or HiSeq 2500 for a 100-bp paired-end run (both from Illumina).

An average of 60 million independent paired reads or 9.7 Gb of sequence data were generated per sample to provide a mean 100-fold coverage across the RefSeq protein-coding exons and flanking intronic sequence (±2 bp) with more than 93% of these bases and 94% of all reported Human Gene Mutation Database (HGMD) variant positions with a depth of coverage 10 × or more.

Thus, we estimated that CES has a more than 93% chance of observing clinically relevant single nucleotide or small indel (insertions and deletions) variant(s).

The mitochondrial genome is not specifically captured, but as a byproduct of being present at a high copy number, 99%

ADD REPLYlink modified 18 months ago • written 18 months ago by Kevin Blighe41k
1

94% of HGMD

AKA exonic variants. SMH

ADD REPLYlink written 18 months ago by RamRS21k
1
gravatar for ptinto
18 months ago by
ptinto190
ptinto190 wrote:

short answer:

You take the list of genes from the probes used for capture (all vendors provide them) and use biomart to get all the annotations you have mention.

Long:

The definition of clinical exome is vage and confusing, and in my opinion should not be use.

First you need to choose what means clinical exome to you: a) the whole exome? Then the atribute "clinical" does not mean anything, perhaps it only states that is going to be used in a diagnostic environment and probably the threshold for base coverage would be over 40.

b) You do an exome but only analyze the clinical relevant genes (400-7000, the ones in OMIM, HGMD....) - you can do a whole exome but only describe mutation in those genes (and only in the core refseq transcripts) - you use some ad-hoc exome capture sets like agilent focus-exome or illumina TruSightOne that constains only probes for these genes and you save in secuencing.

c) It is the whole exome but the clinical relevant genes has improved capture and an effort has been made to achieve maximum coverage on them, like Agilent Sure Select Clinical Research V2

Now that you have chosen your capture set, ¿How to obtain the genes and data you want?

The list of genes are usually stored in files called manifest, or files in bed format. You can obtain Illumina TruSightOne from here, and the agilents ones from SureDesign register yourself there, go to "find_design" tab, select "SureSelect DNA", and click in "Agilent Catalog" tab.

I you don't have bionformatics knowledge you can extract the gene column in Excel and use biomart (filters-> gene-> imput external references [select "gene name" from the options]) and paste them in groups of 500;

ADD COMMENTlink written 18 months ago by ptinto190
1

Gene names and Excel ... just be a bit careful: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7

ADD REPLYlink written 18 months ago by RamRS21k
1

Yes you are right, but not only for gene names. As a gold rule, NEVER open a cvs applying 'general' to the columns in the importer. Change to text and then give later the format that you need for each column. Worst that the gene names, are the date fields where you can face the american-british conversion, and that they become an integer number if you change date to general in the excel sheet :-(. Not to mention decimal points between Spanish and English (, vs .)

ADD REPLYlink written 18 months ago by ptinto190
1

Bom dia / Buenos días, that last point is annoying because I use my operating system in Portuguese. Some of my email accounts are in Spanish, and I also use one email in Italian. The Latin nations use commas as decimal points

ADD REPLYlink modified 18 months ago • written 18 months ago by Kevin Blighe41k
1

The horrible situation in non comma-decimal point nations laboratories is that the computers attached to the instruments (like sequencer ones) are in english, so it would happen at some time that someone is going to start an excel there and paste data from a text file with commas creating a mixed column and someone in the near future would end analysing a dataset with a numeric column with mixup decimal as thousands and viceversa !!. Resistance is futile, I have all my computers in the lab set to English (operating system and office). It save a lot of headaches.

ADD REPLYlink modified 18 months ago • written 18 months ago by ptinto190
1

You have no idea how challenging it is when someone needs assistance during hands-on training we offer and they have a non-US/UK format keyboard. Good luck finding / etc :)

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 723 users visited in the last hour