Question: Find SNPs in a list of genes
0
gravatar for win
5.8 years ago by
win850
India
win850 wrote:

I have a list of genes such as BRCA1, BRCA2, PTEN etc and I want a list of SNPs found in each gene. How could I do that programatically?

Thanks in advance.

 

snp • 2.1k views
ADD COMMENTlink modified 5.8 years ago by Alex Reynolds30k • written 5.8 years ago by win850
1

Could you tell us what kind of data do you have on those genes? 

ADD REPLYlink written 5.8 years ago by guillaume.rbt800

Hi, I just have a list of genes like this:

BRCA1

BRCA2, 

PTEN

and I want a list of SNP for these genes.

ADD REPLYlink written 5.8 years ago by win850
3
gravatar for Alex Reynolds
5.8 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

If you first need a BED file of genes (the genomic locations of your genes), then let's say you have a file called genes.txt that contains your gene names of interest:

$ cat genes.txt
BRCA1
BRCA2
PTEN
$

You can pull GENCODE records (via wget), filter them for genes (via awk), filter again for each of your gene names (via grep -f), and convert them to sorted BED (via convert2bed):

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3=="gene"' - \
    | grep -f genes.txt - \
    | convert2bed -i gff - \
    > genes.bed

No sort-bed step is required here; the default output of convert2bed is sorted (assuming BEDOPS is installed).

Once you have genes.bed, you can use it as the input for the other answer I provided in this post.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Alex Reynolds30k
3
gravatar for Alex Reynolds
5.8 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Programmatically, here's a demonstration of how to do this for 1000Genome SNPs on chrY.

First, start with a sorted set of genes in BED format:

$ sort-bed unsorted_genes.bed > genes.bed

(If you don't have locations of your genes, just edit your question to describe what you have, which we can probably work into a BED file.)

Then map 1000Genome SNPs to those genes:

$ wget -qO- ftp://ftp-trace.ncbi.nih.gov//1000genomes/ftp/release/20130502/ALL.chrY.phase3_integrated.20130502.genotypes.vcf.gz \
    | gunzip --stdout - \
    | convert2bed -i vcf - \
    | awk '{print "chr"$0;}' - \
    | bedmap --echo --echo-map-id-uniq --chrom chrY genes.bed - \
    > answer.chrY.bed

This uses convert2bed to convert VCF to sorted BED, and BEDOPS bedmap --echo-map-id-uniq to map a unique list of SNP IDs to each gene in genes.bed.

We add --chrom chrY to bedmap in order focus file operations on the part of the file we're interested in, which improves performance.

The awk statement prepends each line of the BED output with the string "chr", to reformat the chromosome name to UCSC standards, which is the naming scheme used in GENCODE records pulled in the second answer in this post. We want to make sure the chromosome names line up when mapping genes to SNPs.

This is just for chrY. You could probably parallelize this for the other human chromosomes pretty easily (see: www.biostars.org/p/63816/ for other examples):

$ parallel "wget -qO- ftp://ftp-trace.ncbi.nih.gov//1000genomes/ftp/release/20130502/ALL.{}.phase3_integrated.20130502.genotypes.vcf.gz | gunzip --stdout - | convert2bed -i vcf - | awk '{print \"chr\"$0;}' | bedmap --echo --echo-map-uniq-id --chrom {} genes.bed - > answer.{}.bed"::: {1..22} X Y

Then collate the per-chromosome results into one file:

$ bedops --everything answer.*.bed > answer.bed
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Alex Reynolds30k

I have a list of genes as follows:

BRCA1

BRCA2, 

PTEN

Will i be able to create a .bed from that and use it for getting variants for each gene.

ADD REPLYlink written 5.8 years ago by win850

See the answer provided below, which should give you a list of BED elements for each gene.

ADD REPLYlink written 5.8 years ago by Alex Reynolds30k
1
gravatar for Emily_Ensembl
5.8 years ago by
Emily_Ensembl21k
EMBL-EBI
Emily_Ensembl21k wrote:

You can use the Ensembl Perl API.

ADD COMMENTlink written 5.8 years ago by Emily_Ensembl21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2058 users visited in the last hour