Question: SNPs to rs format
0
gravatar for hosein_salehi6
17 months ago by
hosein_salehi60 wrote:

I have data of ovine high-density 600K snp array ,How I can change names of SNPs to rs format ? Hi thank you for your attention This is an example of the data format . I should change only first column (names to rs format)

"Name"                      "Chr" "Position" "WDS2.GType" "WDS2.Log R Ratio" "WDS2.B AlleleFreq"
250506CS3900140500001_312.1   23   26298017   AA          0.1586    0.0009
250506CS3900218700001_1294.1  2    148802744  AB          0.1641    0.4478
250506CS3900283200001_442.1   1    188498238  AB          0.0914    0.4470
CL635241_413.1                3    182202867  BB          0.0718    1.0000
CL635750_128.1                3    223741135  AB          0.0265    0.4830
CL635944_160.1                6    114778683  BB          -0.3251   0.9972
Contig35697_5761.1            6    18835475   BB          -0.1425   0.9970
CZ924211_329.1                2    106470676  AB          -0.1050   0.49
OAR3_207847578_X.1            3    193150848  AB          -0.0122   0.4828
OAR3_211362922_X.1            3    196313951  AA          0.1171    0.0039
oar3_OAR1_119450287           1    119450287  AA          -0.1919   0.0176
oar3_OAR1_119458232           1    119458232  AA          -0.1807   0.00
OAR6_114756902_X.1            6    104240372  BB          0.0654    1.0000
OARUn.515_157736.1            15   48287818   BB          -0.0536   0.9887
OARUn.54_510669.1             16   70664506   AA          -0.4320   0.0129
OARUn.992_104641.1            18   12223461   BB          0.1197    0.9962
OARUn.992_53657_X.1           18   12171867   AA          0.0340    0.0066
s00002.1                      0    0          AB          -0.0878   0.4830
s00007.1                      20   32340549   BB          0.0397    0.9956
s00014.1                      6    48965903   BB          -0.0218   1.000
snp • 535 views
ADD COMMENTlink modified 17 months ago by Russ460 • written 17 months ago by hosein_salehi60
1

Please add an example of the data format you have.

ADD REPLYlink written 17 months ago by WouterDeCoster41k

Hello hosein_salehi6!

I'm closing this until you restructure the post, hosein_salehi6

ADD REPLYlink modified 17 months ago • written 17 months ago by RamRS24k

I've now opened it. Please also use the formatting bar (especially the code option) to present your post better. I've done it for you this time. Formatting bar

ADD REPLYlink written 17 months ago by RamRS24k
0
gravatar for Russ
17 months ago by
Russ460
Ontario Veterinary College, University of Guelph, Guelph, Ontario, Canada
Russ460 wrote:

Hi hosein_salehi6

It can be tricky dealing with non-reference species. I quickly wrote a script in python that uses the chromosome and position in the text you posted and queries a VCF file with ovine SNPs (sourced from dbSNP 150 apparently). I assumed your results are tab delimited. I accessed the vcf from ensembl:

http://useast.ensembl.org/info/data/ftp/index.html

The script is not quick because it goes through all 6 million variants in the VCF file, but it worked on the sample you provided. Also, some of the variants will not be in the vcf file (like the variant on chromosome 0 and position 0 in the text you posted). For these SNPs the result will print out with "not in vcf file".

Note that you'll want to make sure the genomes used for your snpchip and this VCF file are the same, otherwise your results will be incorrect.

To run the script, copy and paste the code below into a text editor and save it as "sheep_snps.py". Then execute python sheep_snps.py <query_file> <vcf_file> > <output_file>

#!/usr/bin/python

from __future__ import print_function
import sys

snp_dict = {}

with open(sys.argv[1], 'r') as ovine:
    for line in ovine:
        line = line.rstrip()
        if "\"" in line:
            continue
        else:
            query_chrom = line.split("\t")[1]
            query_pos = line.split("\t")[2]
            identifier = query_chrom + "_" + query_pos
            snp_dict[identifier] = line


vcf_dict = {}

with open(sys.argv[2], 'r') as vcf:
    for line in vcf:
        line = line.rstrip()
        if "#" in line:
            continue
        else:
            ref_chrom = line.split("\t")[0]
            ref_pos = line.split("\t")[1]
            ref_identifier = ref_chrom + "_" + ref_pos
            vcf_dict[ref_identifier] = line.split("\t")[2]

for k,v in snp_dict.iteritems():
    try:
        print(vcf_dict[k],v, sep = "\t")
    except:
        print("not_in_vcf_file",v, sep = "\t")

and the results look like:

rs407034953      Contig35697_5761.1            6   18835475   BB  -0.1425  0.9970
rs423741346      OARUn.992_104641.1            18  12223461   BB  0.1197   0.9962
rs55630697       250506CS3900283200001_442.1   1   188498238  AB  0.0914   0.4470
rs420791893      CL635750_128.1                3   223741135  AB  0.0265   0.4830
rs404585538      s00014.1                      6   48965903   BB  -0.0218  1.000
rs428970013      oar3_OAR1_119458232           1   119458232  AA  -0.1807  0.00
rs399786672      CL635944_160.1                6   114778683  BB  -0.3251  0.9972
rs413235350      OAR3_207847578_X.1            3   193150848  AB  -0.0122  0.4828
rs420600628      OAR6_114756902_X.1            6   104240372  BB  0.0654   1.0000
rs400403799      OARUn.515_157736.1            15  48287818   BB  -0.0536  0.9887
rs411856398      OARUn.54_510669.1             16  70664506   AA  -0.4320  0.0129
rs414154811      s00007.1                      20  32340549   BB  0.0397   0.9956
rs429408796      oar3_OAR1_119450287           1   119450287  AA  -0.1919  0.0176
rs413244887      CZ924211_329.1                2   106470676  AB  -0.1050  0.49
rs55630642       250506CS3900140500001_312.1   23  26298017   AA  0.1586   0.0009
rs55630663       250506CS3900218700001_1294.1  2   148802744  AB  0.1641   0.4478
rs409664674      CL635241_413.1                3   182202867  BB  0.0718   1.0000
rs402411363      OARUn.992_53657_X.1           18  12171867   AA  0.0340   0.0066
not_in_vcf_file  s00002.1                      0   0          AB  -0.0878  0.4830
rs401152114      OAR3_211362922_X.1            3   196313951  AA  0.1171   0.0039
ADD COMMENTlink modified 17 months ago • written 17 months ago by Russ460
1

You can use column -t to display text better :-)

ADD REPLYlink written 17 months ago by RamRS24k

Nice tip, thanks Ram.

ADD REPLYlink written 17 months ago by Russ460

Thank you for your attention .How I can to receive VCF or rs file for my data ?

ADD REPLYlink written 17 months ago by hosein_salehi60

In the link I provided above to Ensembl. You will see there is a big table near the bottom of the page. You can filter the results for "sheep" by using the box at the top right of the table. The VCF file is one of the options available there.

ADD REPLYlink written 17 months ago by Russ460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour