Question

Forum:What are the pain points in genomic variant interpretation/annotation processes?

2

Entering edit mode

2.9 years ago

Ryangguk Kim ▴ 90

Hi, I develop genomic variant annotation/interpretation tools. I'd like to know the pain and need regarding variant annotation/interpretation tools. Can you point out where your pain points are in doing variant annotation/interpretation?

Also, if you wouldn't mind, can anyone chat with for just 15 minutes so that I can listen to you talking about what you do with variant analysis?

analysis variant • 2.5k views

ADD COMMENT • link updated 2.8 years ago by Zhenyu Zhang ★ 1.2k • written 2.9 years ago by Ryangguk Kim ▴ 90

0

Entering edit mode

I develop genomic variant annotation/interpretation tools

Can you point us to a few tools you've developed?

ADD REPLY • link 2.9 years ago by Ram 43k

1

Entering edit mode

Sure, mostly it has been https://github.com/KarchinLab/open-cravat

ADD REPLY • link 2.9 years ago by Ryangguk Kim ▴ 90

0

Entering edit mode

There was a comment about the vcf format. We were writing a vcf format parser and had some headache due to a couple of variant caller-specific conventions/modifications which threw the parser off. What were the problems you encountered dealing with the vcf format?

ADD REPLY • link 2.9 years ago by Ryangguk Kim ▴ 90

0

Entering edit mode

Since this is an open-ended question I changed the type to forum. Consider editing the title and making it What are the pain points in genomic .. to make the title clear.

ADD REPLY • link 2.9 years ago by GenoMax 141k

0

Entering edit mode

Thanks. I have edited the title.

ADD REPLY • link 2.9 years ago by Ryangguk Kim ▴ 90

0

Entering edit mode

I'm not in this area but I got the feeling that the non-ML tools suck at deleterious variants in non-coding regions and the ML tools are all overfit to a particular disease

ADD REPLY • link 2.9 years ago by Jeremy Leipzig 22k

score 3 · Answer 1 · 2021-06-01

3

Entering edit mode

2.9 years ago

Zhenyu Zhang ★ 1.2k

I have generated more than 100,000 VCF files. Some pain points are

To annotate multiple variants on the same transcript.
To represent complicated SV and CNV.
To annotated completed variants, including SV, CNV and some INDELs.
Variant normalization.
HGVS sucks (but there is no better alternatives)

Btw, if you are looking for your buddies in this field, I strongly suggest you to join GA4GH Variant Annotation and Variant Representation working groups.

ADD COMMENT • link 2.9 years ago by Zhenyu Zhang ★ 1.2k

0

Entering edit mode

Thanks Zhenyu. The list makes sense. I participated in some GA4GH calls (of the two working groups) and events.

ADD REPLY • link 2.9 years ago by Ryangguk Kim ▴ 90

0

Entering edit mode

By the way, I have a follow-up question, if I may - how were those VCF files used downstream in your work?

ADD REPLY • link 2.9 years ago by Ryangguk Kim ▴ 90

0

Entering edit mode

We (GDC) make MAFs and share all data with the research community.

ADD REPLY • link 2.8 years ago by Zhenyu Zhang ★ 1.2k

score 1 · Answer 2 · 2021-05-31

1

Entering edit mode

2.9 years ago

Kevin Blighe 87k

Every program should:

output variant consequences over all transcript isoforms
output variant annotation in HGVS
output the tissue in which each isoform is most expressed
output Ensembl gene IDs and HGNC gene symbols
indicate orientation of the bases to the reference genome (as we know, many 'variants' are the very reference bases in hg19 and hg38)

ADD COMMENT • link 2.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks Kevin. I have a couple of follow-up questions.

When you get variant consequences over all transcript isoforms, what do you do with them downstream? I saw a few different approaches: use the most deleterious consequence, use the consequence of a pre-determined "representative" transcript such as MANE, etc. Is it related to the expression level in each tissue you mentioned, such as seeing if the dominant isoform in a tissue had a deleterious consequence?

Regarding the tissue in which each isoform is most expressed, can you point me to some data sources that have such information?

ADD REPLY • link 2.9 years ago by Ryangguk Kim ▴ 90

1

Entering edit mode

I am no longer directly involved in the variant interpretation part; however, the clinical scientists with whom I worked [in NHS England] checked variant consequences over all known isoforms via a program (I believe Alamut). They would use literature searches to determine if a consequence over a given isoform was important, before signing the report. The final decision, later, is then in the hands of the referring doctor, sister-laboratory, Lab Director, or Genetic Counsellor, depending on the exact origin of the sample.

My role, as Lead Bioinformatician, was to simply output the variant listing and ensure that nothing was missed, after which they were content to take care of everything. A simple run of GATK, DeepVariant, SAMtools, etc. will miss quite a few clinically-actionable variants.

I would personally not be interested in just seeing the most deleterious consequence, in part due to my lack of trust in NGS data, and also due to the fact that I understand just how complex is the genome.

Regarding MANE, there are Ensembl representatives here and I believe MANE is already in use after there was an initial poll ~2 years ago. Cannot find it right now.