Question

Snps Location Annotations

2

Entering edit mode

12.9 years ago

J.F.Jiang ▴ 910

hello,biostar members:

i have collect about 2000SNPs in my SNPs study. recently i came across a paper, in which the snps are seperated into several parts, that is coding area, 3'UTR, 5'UTR, intergenetic, promoter, TFBS, miRNA, enhancer...

so i want to know how can i use this kind of classification to handle with my SNPs set?

if there is anyone know how to do it, could you offer me a database or scripts to do this jobs?

thank you !

snp annotation • 5.1k views

ADD COMMENT • link updated 9.2 years ago by Alex Reynolds 35k • written 12.9 years ago by J.F.Jiang ▴ 910

score 1 · Answer 1 · 2011-05-27

1

Entering edit mode

12.9 years ago

Travis ★ 2.8k

If I understand correctly, this could well be what you are after:

http://snpeff.sourceforge.net/

Also this:

http://www.openbioinformatics.org/annovar/

I'm a newbie and plan to try both but haven't gotten around to it yet.

ADD COMMENT • link 12.9 years ago by Travis ★ 2.8k

0

Entering edit mode

I have looked up into the two databases, it seems that i did not clearly declare the problem. the 2000SNPs i collected is not the one from NGS platform or array. It is just those i get from papers and databases. So the format of my file only contains the SNPs rs# number, chr#, position, alles, nothing else.

Thus, i want to quickly know which part are they. But it seems that the first database is what i want, i will carefully checked it to see if it is what i want.

ADD REPLY • link 12.9 years ago by J.F.Jiang ▴ 910

score 1 · Answer 2 · 2011-07-28

considering human genome annotation (you will have to pay attention to the human genome version you've worked with, in order to select the appropriate annotations), I would rather go for any of these 2:

if you are looking for a local tool which would annotate your variants by locally download each needed database and then process it, then I would go for ANNOVAR. it is reusable, so it's the best option if you are planning to annotate often or to include it into your own variant detection pipeline. it is also the most complete option we've so far found, and the one we are currently using at our lab.
if you are willing to send your variants to an online web service just to retrieve the annotated results, then I would go for SeattleSeq Annotation. it is fast and simple to use, yet the annotation provided is quite dense.

these 2 are valid options for the thousands of variants coming out of a NGS experiment, so I'm pretty sure that if you format your SNP list into a valid format which any of these 2 programs accept then you will end up annotating your SNP list easily.

Ram · Answer 3 · 2015-02-13

1

Entering edit mode

9.2 years ago

Alex Reynolds 35k

One might start with GFF-formattted GENCODE annotations:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz | gunzip --stdout - > gencode.v21.gff

Using the feature ontology defined here, one can segregate GFF annotations by feature type (see: http://song.cvs.sourceforge.net/viewvc/song/ontology/sofa.obo?revision=1.217). Feature types include keywords like three_prime_UTR, promoter, etc. We can grab a sorted listing of feature types to automate this process. For example:

$ wget -qO- http://song.cvs.sourceforge.net/viewvc/song/ontology/sofa.obo?revision=1.217 | grep '^name:' | sed 's/name: //' | sort > gff_feature_types.txt

We can then segregate the GENCODE annotations by feature type:

$ while read feature_type; do grep ${feature_type} gencode.v21.gff > feature.${feature_type}.gff; done < gff_feature_types.txt

Let's assume that you have your variants in a VCF-formatted file called variants.vcf. Let's convert it to BED with vcf2bed:

$ vcf2bed < variants.vcf > variants.bed

For each smaller annotation file that is of non-zero size, we can convert its annotations to BED elements with gff2bed. We then perform set operations against the variants, separating them into per-feature-type categories based on one or more bases of overlap with the annotation subset:

$ find . -name feature.*.gff ! -size 0 -exec bedops --element-of 1 variants.bed <(gff2bed < {}) > variants.{}.bed \;

Each non-empty file variants.*.bed contains variants that overlap a GENCODE v21 feature by its feature type.

ADD COMMENT • link updated 4.5 years ago by Ram 43k • written 9.2 years ago by Alex Reynolds 35k

0

Entering edit mode

I am trying to use the above code to annotate a few sites I have. I am able to get all the feature.${feature_type}.gff files, but the last line shows some error below.

find . -name feature.*.gff ! -size 0 -exec bedops --element-of 1 sample1.bed <(gff2bed < {}) > variants.{}.bed \;

-bash: {}: No such file or directory
find: paths must precede expression: feature.coding_region_of_exon.gff

Any suggestion how I can modify the code? Thanks

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.3 years ago by Sandeep ▴ 260

0

Entering edit mode

when the -exec option gets complicated I find it easier to build a for loop. in fact I always try to code as visual as possible in order to quickly review it when needed.

for file in `find . -name feature.*.gff ! -size 0`; do
  cat $file \
  | gff2bed \
  | bedops --element-of 1 sample1.bed \
  > variants.$file.bed
done

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.3 years ago by Jorge Amigo 14k

0

Entering edit mode

Perhaps try wrapping the command in tick marks:

find . -name feature.*.gff ! -size 0 -exec 'bedops --element-of 1 sample1.bed <(gff2bed < {}) > variants.{}.bed \;'

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.3 years ago by Alex Reynolds 35k

score 0 · Answer 4 · 2011-07-27

0

Entering edit mode

12.7 years ago

Radhouane Aniba ▴ 790

I think PLINK is well suited for your problem, look at SNP annotation

ADD COMMENT • link 12.7 years ago by Radhouane Aniba ▴ 790