Question

Why Are These Snps On The Wrong Strand Compared To The Reference Genome

2

Entering edit mode

10.2 years ago

Click downvote ▴ 720

Hi--

I've got some gwas data I'd like to impute, but for that to happen, I need every snp to be aligned to the forward strand of the reference genome. This is not as simple as it sounds, due to many snps being ambiguous (A/T or C/G) combos.

Therefore I've tried looking at both strand data for the chips, and also the snp manifests, comparing them to the snps that are flipped, but I cannot see any pattern. What I'm looking for is a pattern in these files which explains why the snps on the first list is incorrect compared to the reference genome. If you see anything or would need more info please do ask.

Ps. the data might be botched by the researchers who used these data originally (they've moved on long since.)

Here is the head of a list of snps that are flipped compared to the reference (name, chr, position, a1, a2, reference nucleotide):

rs1774963       1 21703207 C T G
rs2257576       1 83736947 T C A
rs315041        1 77055775 A G C
rs3094315       1 752565 C T G
rs3737728       1 1021414 T C A
rs11721 1 1152630 T G C
rs2887286       1 1156130 G A T
rs3813199       1 1158276 T C G
rs3766186       1 1162434 T G C

Here are the corresponding entries from the strand file (http://www.well.ox.ac.uk/~wrayner/strand/):

rs1774963       1       21703208        99.1735537190083        +       AG
rs2257576       1       83736948        100     +       AG
rs315041        1       77055776        99.1735537190083        -       AG
rs3094315       1       752566  99.1735537190083        +       AG
rs3737728       1       1021415 100     +       AG
rs11721 1       1152631 99.1735537190083        +       AC
rs2887286       1       1156131 100     -       AG
rs3813199       1       1158277 99.1735537190083        +       AG
rs3766186       1       1162435 99.1735537190083        +       AC

Here are the corresponding entries from the snp table/manifest:

Name    SNP     ILMN Strand     Customer Strand
rs1774963       [A/G]   TOP     BOT
rs2257576       [A/G]   TOP     BOT
rs315041        [T/C]   BOT     TOP
rs3094315       [T/C]   BOT     TOP
rs3737728       [A/G]   TOP     BOT
rs11721 [A/C]   TOP     BOT
rs2887286       [T/C]   BOT     TOP
rs3813199       [A/G]   TOP     BOT
rs3766186       [A/C]   TOP     BOT

What is the rule that explains why the snps on the first lists are opposite of the reference genome? Or might these data be non-sensical?

snp gwas strand • 4.7k views

ADD COMMENT • link updated 7.9 years ago by nadne ▴ 40 • written 10.2 years ago by Click downvote ▴ 720

score 0 · Answer 1 · 2016-05-12

0

Entering edit mode

7.9 years ago

nadne ▴ 40

Did anyone resolved that?

ADD COMMENT • link 7.9 years ago by nadne ▴ 40

0

Entering edit mode

I'd have thought that the variants' alleles in the first list were reported on the reverse strand e.g. rs3737728 from dbSNP but on the forward strand elsewhere e.g. Ensembl. I've not checked all of the variants above but for the ones I did, this seems to be the case. Check this FAQ.

ADD REPLY • link 7.9 years ago by Denise CS ★ 5.2k

score 0 · Answer 2 · 2016-05-23

0

Entering edit mode

7.9 years ago

nadne ▴ 40

Check out this resource for updating strands, and A/B mapping.

ADD COMMENT • link 7.9 years ago by nadne ▴ 40