VEP outputs the similar INFO CSQ data multiple times
1
1
Entering edit mode
4.0 years ago
magnolia ▴ 20

Hi,

I'm using example data shown in https://asia.ensembl.org/info/docs/tools/vep/vep_formats.html#default to test VEP. I don't know if it's important but I also added some fields and ClinVar as custom source. I get multiple CSQ data. Mostly identical but sometimes even gene symbol is different. Why is this happening? How can I decide which data is "actual" data?

Thanks!

Input Data

1   881907    881906    -/C   +
5   140532    140532    T/C   +
12  1017956   1017956   T/A   +

18 Lines of INFO CSQ Data For 12:1017956-1017956

12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|*/K|Tag/Aag|HIGH|1|stop_lost|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|*/K|Tag/Aag|HIGH|1|stop_lost|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|*/K|Tag/Aag|HIGH|1|stop_lost|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|*/K|Tag/Aag|HIGH|1|stop_lost|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|||MODIFIER|1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|*/K|Tag/Aag|HIGH|1|stop_lost|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|||MODIFIER|1|non_coding_transcript_exon_variant|||||||||||,
12_1017956_T/A|RAD52|12:1017956|A|T|||MODIFIER|-1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|||MODIFIER|1|downstream_gene_variant|||||||||||,
12_1017956_T/A|WNK1|12:1017956|A|T|||MODIFIER|1|downstream_gene_variant|||||||||||
VEP VCF TSV ensembl annotation • 2.1k views
ADD COMMENT
3
Entering edit mode
4.0 years ago

it would be interesting to see the definition of the fields in the header ##INFO=<ID=CSQ..> , but my guess is that this result is the output for multiple transcripts.

For some reasons, the transcript ID is not displayed.

ADD COMMENT
0
Entering edit mode

Thank you. I think It's because of transcripts as well. I didn't put it in the fields, so that's why they're not shown in CSQ. Is there any way to disable transcripts? I only want rsid, gnomad, clinvar, amino acid change. And it would be great to have only one CSQ for each position.

ADD REPLY
2
Entering edit mode

Hi Magnolia,

Pierre is correct in saying that the multiple rows in your output corresponds to multiple transcripts. A single variant can have multiple predicted consequences (on the multiple transcripts of a single gene or even multiple transcripts of 2 or more genes).

You can use the different filtering options when running VEP, such as --pick and --per_gene, to restrict your results: http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#filt

You could also use the filter_vep script to filter your output with multiple rows: http://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html

ADD REPLY
0
Entering edit mode

Using pick options really worked. Thank you!

ADD REPLY
0
Entering edit mode

warning? if i understand it correctly, it seems like --per_gene would throw out all variants except for the one at the position with the highest consequence. whereas --pick would keep one per position.

ADD REPLY

Login before adding your answer.

Traffic: 2675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6