Question: Finding the Genes corresponding to given SNPs using python
0
gravatar for mms140130
10 months ago by
mms14013050
mms14013050 wrote:

Hi,

my Question is how to find the genes related to given SNPs the part of Snps data I have as follows:

> rs987435        C       G       1       1       1       0       2
> rs345783        C       G       0       0       1       0       0
> rs955894        G       T       1       1       2       2       1
> rs6088791       A       G       1       2       0       0       1
> rs11180435      C       T       1       0       1       1       1
> rs17571465      A       T       1       2       2       2       2
> rs17011450      C       T       2       2       2       2       2
> rs6919430       A       C       2       1       2       2       2
> rs2342723       C       T       0       2       0       0       0
> rs11992567      C       T       2       2       2       2       2

While the data for the gene annotation was downloaded from UCSC website

> DDX11L1   NR_046018   chr1    +   11873   14409   14409   14409
> WASH7P    NR_024540   chr1    -   14361   29370   29370   29370
> LINC01204 NR_104644   chrX    +   45364632    45386484    45386484    45386484
> LOC392232 NR_033867   chr8    -   73114986    73163869    73163869    73163869
> FBXL22    NM_203373   chr15   +   63889551    63894620    63889591    63893885
> LOC729737 NR_039983   chr1    -   134772  140566  140566  140566
> LOC100132287  NR_028322   chr1    +   323891  328581  328581  328581
> LOC100132062  NR_028325   chr1    +   323891  328581  328581  328581

> > 
>     > where the columns names are as follows table refFlat "A gene
>     > prediction with additional geneName field."
>     >     (
>     >     string  geneName;           "Name of gene as it appears in Genome Browser."
>     >     string  name;               "Name of gene"
>     >     string  chrom;              "Chromosome name"
>     >     char[1] strand;             "+ or - for strand"
>     >     uint    txStart;            "Transcription start position"
>     >     uint    txEnd;              "Transcription end position"
>     >     uint    cdsStart;           "Coding region start"
>     >     uint    cdsEnd;             "Coding region end"
>     >     uint    exonCount;          "Number of exons"
>     >     uint[exonCount] exonStarts; "Exon start positions"
>     >     uint[exonCount] exonEnds;   "Exon end positions"
>     >     )

how can I use this data to find the gene related to which SNPs

Thank you

snp genome gene • 511 views
ADD COMMENTlink modified 10 months ago by mforde841.0k • written 10 months ago by mms14013050
3
gravatar for mforde84
10 months ago by
mforde841.0k
mforde841.0k wrote:

There are a lot of good annotation tools out there that you an use including SnpEff, VariantEffectPredictor, ANNOVAR, Oncotator, SNP-nexus, etc.

ADD COMMENTlink written 10 months ago by mforde841.0k

I don't understand what do you mean

ADD REPLYlink written 10 months ago by mms14013050

If you want to lookup the annotation information for a given dbSNP id, then you need to interface with the dbSNP database directly and do a lookup. Or you can use a variety of web-interface tools available for this purpose, e.g., SNP-nexus takes dbSNP ids as input.

If you need to lookup annotation information based on genomic location, for example like the data you'll see in a standard vcf file, then you can use any of the tools above which work directly with genomic coordinate variant calls. alternatively, you can download the annotations (gtf, gff) corresponding to your sequence assembly then recursively examine your data with the annotation features.

ADD REPLYlink written 10 months ago by mforde841.0k

Not exactly what OP was asking for (annotation using Python), but still the best/right answer for his issue :-)

ADD REPLYlink modified 10 months ago • written 10 months ago by WouterDeCoster24k
1

HOWTO: Annotations using assembly :).

But honestly, you'd be surprised how much I see bioinformaticians using python and all they're are doing is making a bunch of system calls. It's both hilarious, and disturbing. I get why someone would want to do this, as I occasionally do it in Rscripts, but still.

ADD REPLYlink modified 10 months ago • written 10 months ago by mforde841.0k

The thing is my data is really large almost 9 million snps that is why I asked about python ... I'm new to python if any can help with that I do appreciate that.. I have a file with rs for the snps

ADD REPLYlink written 10 months ago by mms14013050
1

That's quite a lot indeed. Then I think the easiest path is to:

  1. Download the dbSNP databank in vcf format
  2. Filter the vcf using your list of rs IDs (using grep -f)
  3. Use SnpEff or similar for annotation of the SNPs
ADD REPLYlink written 10 months ago by WouterDeCoster24k

thank you, can you explain how to start SNPEff Actually I was searching the whole day . I got tired so would you please give me the steps Appreciate your help

ADD REPLYlink written 10 months ago by mms14013050

I think the manual is very clear, what did you try and what doesn't work? Try to be specific when asking questions.

ADD REPLYlink written 10 months ago by WouterDeCoster24k

I tried SCAN http://www.scandb.org/newinterface/index_v1.html But my data is really large that is why it doesn't work .. same for snp nexus didn't work .

ADD REPLYlink written 10 months ago by mms14013050

I listed the steps here: C: Finding the Genes corresponding to given SNPs using python
What's wrong? Where are you stuck?

ADD REPLYlink written 10 months ago by WouterDeCoster24k

I used the following:

curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp147.txt.gz" | gunzip > datasnp

ADD REPLYlink modified 10 months ago • written 10 months ago by mms14013050

Can you pleas tell me where I can find the dbSNP databank in VCF format, or I can use the txt format as i mentioned above

ADD REPLYlink modified 10 months ago • written 10 months ago by mms14013050

now I have the dbSnp data base in txt format and I also have my dbSNp in a txt file called pre_snpinfo_tumor ( 4th column).

what is the code to filer the snp database ?

ADD REPLYlink written 10 months ago by mms14013050
1

You would need a file containing just the dbSNP identifiers, you can isolate this one using awk or cut. This file would be DesiredIdentifiers.txt. Then you can use a simple grep, such as

grep -f DesiredIdentifiers.txt dbSNPfile.vcf
ADD REPLYlink written 10 months ago by WouterDeCoster24k

I'm really confused could you please tell me where to find the VCF file for the snp database as what I have downloaded from the UCSC website doesn't contain the gene names

ADD REPLYlink written 10 months ago by mms14013050
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 946 users visited in the last hour