Why VEP doesn't take strand into account?
2
0
Entering edit mode
4 months ago
magnolia • 0

Hi,

I'm annotating my variants with VEP. To test, I picked a random rsID and created a test input with all possible options with strand and without strand information.

But I get both 1 and -1 strands in my output and exactly the same amount of rows. Since I don't have transcript information, I also tried --pick_allele and --pick options but I still mostly get -1 strand results.

How does providing strand information play a role then?

Input with strand

1   45795027    45795027    C/G +
1   45795027    45795027    C/T +
1   45795027    45795027    G/G +
1   45795027    45795027    T/T +


Input without strand

1   45795027    45795027    C/G
1   45795027    45795027    C/T
1   45795027    45795027    G/G
1   45795027    45795027    T/T

vep ensembl snp • 286 views
4
Entering edit mode
4 months ago

1   45795027    45795027    C/G +
1   45795027    45795027    G/C -


Can you see how the two lines above are the same variant, just reported on different strands. If you report all your alleles on the positive strand, you don't need to include strand as the VEP will assume you've done this. If you report some on the reverse, you must include this information so the VEP knows what alleles to use.

Variants will always affect features on both strands, so the VEP will always report these, regardless of what strand you've used for your alleles.

1
Entering edit mode
4 months ago

This is DNA you're talking about. The SNP will be found at that position in both strands. That strand column is being ignored.

0
Entering edit mode

Thank you for your answer. It's interesting that it will be ignored. In documentation, it says strand is necessary.

The default format is a simple whitespace-separated format (columns may be separated by space or tab characters), containing five required columns plus an optional identifier column

https://m.ensembl.org/info/docs/tools/vep/vep_formats.html

0
Entering edit mode

Let me wildly guess: I think the strand information is necessary to make statements when it comes to annotation of variants, such as overlap with promoters/TSS/TES etc so the strand then tells the tool whether the start coordinate is actually the start of the overlapping feature on the plus strand rather than the end of a feature on the minus strand. For mutations itself it does not matter since, as pointed out, DNA is double-stranded and therefore any mutation always occurs on both strands.

0
Entering edit mode

Oh okay. Thank you for the explanation. My results also have problems like reporting variants even though input is homozygous reference. Especially when there is multiple alternative alleles in that position. I thought maybe strand plays a role in this but no, obviously it has a different reason. I gotta check more. Thanks!