Why VEP doesn't take strand into account?
2
0
Entering edit mode
3.2 years ago
magnolia ▴ 20

Hi,

I'm annotating my variants with VEP. To test, I picked a random rsID and created a test input with all possible options with strand and without strand information.

But I get both 1 and -1 strands in my output and exactly the same amount of rows. Since I don't have transcript information, I also tried --pick_allele and --pick options but I still mostly get -1 strand results.

How does providing strand information play a role then?

Input with strand

1   45795027    45795027    C/G +
1   45795027    45795027    C/T +
1   45795027    45795027    G/G +
1   45795027    45795027    T/T +

Input without strand

1   45795027    45795027    C/G
1   45795027    45795027    C/T
1   45795027    45795027    G/G
1   45795027    45795027    T/T
vep ensembl snp • 1.5k views
ADD COMMENT
4
Entering edit mode
3.2 years ago
Emily 23k

The strand only affects how the VEP reads your alleles.

1   45795027    45795027    C/G +
1   45795027    45795027    G/C -

Can you see how the two lines above are the same variant, just reported on different strands. If you report all your alleles on the positive strand, you don't need to include strand as the VEP will assume you've done this. If you report some on the reverse, you must include this information so the VEP knows what alleles to use.

Variants will always affect features on both strands, so the VEP will always report these, regardless of what strand you've used for your alleles.

ADD COMMENT
1
Entering edit mode
3.2 years ago

This is DNA you're talking about. The SNP will be found at that position in both strands. That strand column is being ignored.

ADD COMMENT
0
Entering edit mode

Thank you for your answer. It's interesting that it will be ignored. In documentation, it says strand is necessary.

The default format is a simple whitespace-separated format (columns may be separated by space or tab characters), containing five required columns plus an optional identifier column

https://m.ensembl.org/info/docs/tools/vep/vep_formats.html

ADD REPLY
0
Entering edit mode

Let me wildly guess: I think the strand information is necessary to make statements when it comes to annotation of variants, such as overlap with promoters/TSS/TES etc so the strand then tells the tool whether the start coordinate is actually the start of the overlapping feature on the plus strand rather than the end of a feature on the minus strand. For mutations itself it does not matter since, as pointed out, DNA is double-stranded and therefore any mutation always occurs on both strands.

ADD REPLY
0
Entering edit mode

Oh okay. Thank you for the explanation. My results also have problems like reporting variants even though input is homozygous reference. Especially when there is multiple alternative alleles in that position. I thought maybe strand plays a role in this but no, obviously it has a different reason. I gotta check more. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6