Question

Classify extracted variants

0

Entering edit mode

4.6 years ago

ola.helenamartins • 0

Hello everyone,

I extracted the variants for all my data using chromosome coordinates and now I have a very BIG text file, with the following format:

SAMPLE CHROM POS ID REF ALT QUAL FILTER GT GQ DP
XXXX chr2 165990524 rs4667859 T C 256 PASS 1/1 66 23
YYYY chr2 165993939 rs139604390 G A 155 PASS 0/1 188 33

I would like to know which would be the fast way to annotated these variants, especially to get the consequence of my variants on the protein sequence (e.g. stop gained, missense, stop lost, frameshift).

I tried to work with VEP, but I am not sure about the input format in this case. Any thoughts about this?

Thank you.

snp sequencing gene vep • 765 views

ADD COMMENT • link updated 4.6 years ago by zx8754 11k • written 4.6 years ago by ola.helenamartins • 0

0

Entering edit mode

Your format seems to be extracted from a VCF, as pointed by @nicolas, you can use VEP if the data is provided in the correct format, so you can convert back it to VCF or use the VEP REST API

ADD REPLY • link 4.6 years ago by JC 13k

score 4 · Answer 1 · 2019-09-23

4

Entering edit mode

4.6 years ago

Nicolas Rosewick 11k

ENSEMBL's VEP : https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html

input format readable by VEP : https://www.ensembl.org/info/docs/tools/vep/vep_formats.html#input

ADD COMMENT • link 4.6 years ago by Nicolas Rosewick 11k