Question

VEP outputs the similar INFO CSQ data multiple times

1

Entering edit mode

4.5 years ago

magnolia ▴ 30

Hi,

I'm using example data shown in https://asia.ensembl.org/info/docs/tools/vep/vep_formats.html#default to test VEP. I don't know if it's important but I also added some fields and ClinVar as custom source. I get multiple CSQ data. Mostly identical but sometimes even gene symbol is different. Why is this happening? How can I decide which data is "actual" data?

Thanks!

Input Data

1   881907    881906    -/C   +
5   140532    140532    T/C   +
12  1017956   1017956   T/A   +

18 Lines of INFO CSQ Data For 12:1017956-1017956

12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|*/K|Tag/Aag|HIGH|1|stop_lost|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|*/K|Tag/Aag|HIGH|1|stop_lost|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|*/K|Tag/Aag|HIGH|1|stop_lost|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|*/K|Tag/Aag|HIGH|1|stop_lost|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|||MODIFIER|1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|*/K|Tag/Aag|HIGH|1|stop_lost|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|||MODIFIER|1|non_coding_transcript_exon_variant|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|||MODIFIER|1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|||MODIFIER|1|downstream_gene_variant|||||||||||

VEP VCF TSV ensembl annotation • 2.4k views

ADD COMMENT • link updated 2.1 years ago by LayneSadler ▴ 90 • written 4.5 years ago by magnolia ▴ 30

score 4 · Answer 1 · 2020-05-05

4

Entering edit mode

4.5 years ago

Pierre Lindenbaum 164k

it would be interesting to see the definition of the fields in the header ##INFO=<ID=CSQ..> , but my guess is that this result is the output for multiple transcripts.

For some reasons, the transcript ID is not displayed.

ADD COMMENT • link 4.5 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thank you. I think It's because of transcripts as well. I didn't put it in the fields, so that's why they're not shown in CSQ. Is there any way to disable transcripts? I only want rsid, gnomad, clinvar, amino acid change. And it would be great to have only one CSQ for each position.

ADD REPLY • link 4.5 years ago by magnolia ▴ 30

2

Entering edit mode

Hi Magnolia,

Pierre is correct in saying that the multiple rows in your output corresponds to multiple transcripts. A single variant can have multiple predicted consequences (on the multiple transcripts of a single gene or even multiple transcripts of 2 or more genes).

You can use the different filtering options when running VEP, such as --pick and --per_gene, to restrict your results: http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#filt

You could also use the filter_vep script to filter your output with multiple rows: http://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html

ADD REPLY • link 4.5 years ago by Ben Moore ★ 2.4k

0

Entering edit mode

Using pick options really worked. Thank you!

ADD REPLY • link 4.5 years ago by magnolia ▴ 30

0

Entering edit mode

warning? if i understand it correctly, it seems like --per_gene would throw out all variants except for the one at the position with the highest consequence. whereas --pick would keep one per position.

ADD REPLY • link 2.1 years ago by LayneSadler ▴ 90