Question: Why Are These Snps On The Wrong Strand Compared To The Reference Genome
6.0 years ago
Click downvote670 wrote:


I've got some gwas data I'd like to impute, but for that to happen, I need every snp to be aligned to the forward strand of the reference genome. This is not as simple as it sounds, due to many snps being ambiguous (A/T or C/G) combos.

Therefore I've tried looking at both strand data for the chips, and also the snp manifests, comparing them to the snps that are flipped, but I cannot see any pattern. What I'm looking for is a pattern in these files which explains why the snps on the first list is incorrect compared to the reference genome. If you see anything or would need more info please do ask.

Ps. the data might be botched by the researchers who used these data originally (they've moved on long since.)

Here is the head of a list of snps that are flipped compared to the reference (name, chr, position, a1, a2, reference nucleotide):

rs1774963       1 21703207 C T G
rs2257576       1 83736947 T C A
rs315041        1 77055775 A G C
rs3094315       1 752565 C T G
rs3737728       1 1021414 T C A
rs11721 1 1152630 T G C
rs2887286       1 1156130 G A T
rs3813199       1 1158276 T C G
rs3766186       1 1162434 T G C

Here are the corresponding entries from the strand file (

rs1774963       1       21703208        99.1735537190083        +       AG
rs2257576       1       83736948        100     +       AG
rs315041        1       77055776        99.1735537190083        -       AG
rs3094315       1       752566  99.1735537190083        +       AG
rs3737728       1       1021415 100     +       AG
rs11721 1       1152631 99.1735537190083        +       AC
rs2887286       1       1156131 100     -       AG
rs3813199       1       1158277 99.1735537190083        +       AG
rs3766186       1       1162435 99.1735537190083        +       AC

Here are the corresponding entries from the snp table/manifest:

Name    SNP     ILMN Strand     Customer Strand
rs1774963       [A/G]   TOP     BOT
rs2257576       [A/G]   TOP     BOT
rs315041        [T/C]   BOT     TOP
rs3094315       [T/C]   BOT     TOP
rs3737728       [A/G]   TOP     BOT
rs11721 [A/C]   TOP     BOT
rs2887286       [T/C]   BOT     TOP
rs3813199       [A/G]   TOP     BOT
rs3766186       [A/C]   TOP     BOT

What is the rule that explains why the snps on the first lists are opposite of the reference genome? Or might these data be non-sensical?

gwas snp strand • 3.2k views
written 6.0 years ago by Click downvote670
3.8 years ago
European Union
nadne40 wrote:

Did anyone resolved that?

written 3.8 years ago by nadne40

I'd have thought that the variants' alleles in the first list were reported on the reverse strand e.g. rs3737728 from dbSNP but on the forward strand elsewhere e.g. Ensembl. I've not checked all of the variants above but for the ones I did, this seems to be the case. Check this FAQ.

written 3.8 years ago by Denise - Open Targets
3.7 years ago
European Union
nadne40 wrote:

Check out this resource for updating strands, and A/B mapping.

written 3.7 years ago by nadne40
