Question: (Closed) Quick And Easy Way To Map Snps To Genes (Or Regions Near Genes)
4
gravatar for confusedious
5.6 years ago by
confusedious410
Australia
confusedious410 wrote:

Hello everyone,

I have estimated pairwise Weir & Cockerham's Fst between two populations of interest to me for a large number of SNPs in HapMap 3 using 'R'.

I hope to run high Fst SNPs through Panther to determine the breakdown of pathway involvement, but in order to do this it would be best if I could first assign these SNPs to the genes they are within or near.

Is there a quick and easy way to map a list of SNPs to genes and regions say +/- 10kb from genes?

Any help would be appreciated.

hapmap R mapping snp • 10k views
ADD COMMENTlink modified 20 months ago by nisha.ukrani0 • written 5.6 years ago by confusedious410
2

exact duplicate of

How to map a SNP to a gene around +/- 60KB ?

ADD REPLYlink written 5.6 years ago by Pierre Lindenbaum115k
1

Bedtools?

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by zx87546.0k
1

Take a look at this tutorial. Easy and very helpful! ;)

ADD REPLYlink written 5.6 years ago by Peixe570
1

In the past I've used the R package "NCBI2R". Best of luck.

ADD REPLYlink written 5.6 years ago by Sheila260

If you just want gene names with 10kb of the SNP:

$ bedmap --echo --echo-map-id-uniq --range 10000 <(vcf2bed snps.vcf) genes.bed

If you want their locations, also:

$ bedmap --echo --echo-map --range 10000 <(vcf2bed snps.vcf) genes.bed
ADD REPLYlink modified 20 months ago • written 20 months ago by Alex Reynolds26k

HEY I didn't get it, can you explain a bit more

ADD REPLYlink written 20 months ago by nisha.ukrani0

These are operations to retrieve genes within 10kb of a SNP.

If you want the very nearest gene to a SNP, you can use closest-features:

$ closest-features --closest --dist <(vcf2bed snps.vcf) genes.bed > answer.bed

This gives you, for each SNP, its nearest gene and the distance value in bases.

ADD REPLYlink written 20 months ago by Alex Reynolds26k

Hello, do we need to load any packages to run this command. I am getting an error "bedmap:command not found".

ADD REPLYlink written 20 months ago by apsairam00

The package to install is called bedops.

ADD REPLYlink modified 20 months ago • written 20 months ago by Alex Reynolds26k

Hello confusedious!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLYlink written 20 months ago by zx87546.0k
3
gravatar for Sean Davis
5.6 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

If you are already working in R:

  1. Convert your snps to a GRanges object
  2. Grab the locations of genes from a txdb object, from biomart, or from a UCSC track and convert to GRanges
  3. Expand the boundaries of your genes by adding 10kb (see resize() in R)
  4. Find overlapping SNPs using %over% between your two GRange objects.
ADD COMMENTlink written 5.6 years ago by Sean Davis25k
3
gravatar for Emily_Ensembl
5.6 years ago by
Emily_Ensembl16k
EMBL-EBI
Emily_Ensembl16k wrote:

You can use the Ensembl Variant Effect Predictor to see which genes are affected by your SNPs.

http://www.ensembl.org/info/docs/variation/vep/index.html

This is available as an online tool and as a Perl script. It will predict the consequences of each variant in the genes affected, giving you SO terms (see the link below). Variants within 5kb of a gene will be listed as upstream or downstream gene variants. You can also choose to get SIFT and PolyPhen scores for amino acid changes. And you can check if your variants have already been annotated in our database.

http://www.ensembl.org/info/docs/variation/predicted_data.html#consequences

If you really want to use a 10kb flank, there is a plugin that you can use with the VEP script that will allow you to change the flanks. You'll find it here:

https://github.com/ensembl-variation/VEP_plugins/blob/master/UpDownDistance.pm

Run as

perl variant_effect_predictor.pl -i input.txt -plugin UpDownDistance,10000

to set the distance to 10kb. Up/down distances can also be set separately:

perl variant_effect_predictor.pl -i input.txt -plugin UpDownDistance,10000,20000

sets up to 10kb and down to 20kb.

ADD COMMENTlink modified 5.6 years ago by Sean Davis25k • written 5.6 years ago by Emily_Ensembl16k
1
gravatar for ewre
5.6 years ago by
ewre210
United States
ewre210 wrote:

Both ncbi and ensembl have tools to do this. I have been using ncbi 'eutils link' tools to get gene related snps in batch mode. see if it can be helpful for you with little changes in the script.

#copied from NCBI efetch demonstration webpage

use LWP::Simple;
use LWP::UserAgent;
$db1 = 'gene';
$db2 = 'snp';
$linkname = 'gene_snp';

$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/''>http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';

#get the gene list from file

my $fname=shift;

open(pFILE,"<$fname")||die("error open genelist file\n");
while(<pFILE>){
    chomp;
    push @ids, $_;
}
#print @ids,"\n";
#assemble  the elink URL as an HTTP POST call

$url = $base . "elink.fcgi";

$url_params = "dbfrom=$db1&db=$db2&linkname=$linkname";
foreach $id (@ids) {
   $url_params .= "&id=$id";
}

#create HTTP user agent

$ua = new LWP::UserAgent;
$ua->agent("elink/1.0 " . $ua->agent);

#create HTTP request object

$req = new HTTP::Request POST => "$url";
$req->content_type('application/x-www-form-urlencoded');
$req->content("$url_params");

#post the HTTP request

$response = $ua->request($req); 
$output = $response->content;

open (OUT, ">snp_list.txt") || die "Can't open file!\n";

while ($output =~ /<LinkSet>(.*?)<\/LinkSet>/sg) {

$linkset = $1;
if ($linkset =~ /<IdList>(.*?)<\/IdList>/sg) {
      $input = $1;
      $input_id = $1 if ($input =~ /<Id>(\d+)<\/Id>/sg); 
}

while ($linkset =~ /<Link>(.*?)</Link>/sg) {
      $link = $1;
      push (@output, $1) if ($link =~ /<Id>(\d+)<\/Id>/);
   }

print OUT "$input_id:","\n";
    foreach(@output)
    {print OUT $_,"\n";}

}

close OUT;
ADD COMMENTlink modified 20 months ago by RamRS19k • written 5.6 years ago by ewre210
1
gravatar for richardc.gsc
5.6 years ago by
richardc.gsc150
richardc.gsc150 wrote:

If you just care about the coordinates of the genes +/- some window you can get a gene list from UCSC or Ensembl in .bed format and use BEDTools.

If you want all the bells and whistles, but not as much control over the windowing around your genes, you can use SNPEff

ADD COMMENTlink written 5.6 years ago by richardc.gsc150
0
gravatar for Gabriel R.
5.6 years ago by
Gabriel R.2.5k
Center for Geogenetik KĂžbenhavns Universitet
Gabriel R.2.5k wrote:

Are your SNPs in vcf format ? If so, you could tabix them and write something like:

for region in `write command to get genomic regions with genes in the format chr:start-end`;
do
tabix vcffile $region >> smaller.vcf;
done
ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Gabriel R.2.5k
0
gravatar for confusedious
5.6 years ago by
confusedious410
Australia
confusedious410 wrote:

I tried to use the Ensembl method above, but the input method appears to be based on chromosome position - forgive me if I am wrong about this as I am fairly new to all of this.

In essence, I have a list of SNPs by 'rs' number. Is there a simple tool that I can just dump the list of 'rs' numbers into that will return genes that they fall in or near? As I suggested 10kb either side would be nice but I'd be happy with 5kb if that is the default.

What I am looking for is pretty basic - put in 'rs' numbers and receive a list of gene symbols.

Thank you for your help and patience - as I have mentioned I am new to this. I have only recently made the jump from anthropology to genetics, so I would appreciate simplicity over all else!

ADD COMMENTlink written 5.6 years ago by confusedious410
2

The VEP allows input of just rs numbers. That works fine. Just choose "variant identifiers" as your input.

http://www.ensembl.org/info/docs/variation/vep/vep_formats.html#id

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Emily_Ensembl16k

Alright, I'll try this. Is there a ceiling limit to how many SNPs can be entered? And when the results are returned are the gene symbols/IDs easily recovered (i.e. not buried amongst other data relating to the SNPs such as synon, non-synon and other info I won't be using)? I don't really need any information other than the gene symbols/IDs and with the number of SNPs I am dealing with (~150,000 after QC), I would prefer to have the corresponding genes returned in a user friendly format.

I have ended up with a rather large sub-set of SNPs from my original data (~1 million SNPs), based on a certain ranking of marker by marker Fst scores in pairwise comparisons that is predicted under a hypothesis relating to an environmental variable and its effects on population differentiation. The plan is to see which pathways are enriched in this sub-set of my whole data set, and to do this I need a list of the genes that my SNPs relate to (as explained in my original question).

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by confusedious410
1

The online tool has a maximum of ~750, so you're better off using the script.

It'll output in a nice table, so it should be easy to find the data you need. Have a look at this section of the documentation to see how you can format your output to suit your purposes:

http://www.ensembl.org/info/docs/variation/vep/vep_script.html#output

ADD REPLYlink written 5.5 years ago by Emily_Ensembl16k

Thanks again Emily - you are very helpful.

ADD REPLYlink written 5.5 years ago by confusedious410

Pierre Lindenbaum 's answer to How to map a SNP to a gene around +/- 60KB ? is the best way to go. Otherwise, if you want quick copy&paste&getoutput try HaploReg.

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by zx87546.0k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1087 users visited in the last hour