Snps In Noncoding Region Of Genome
3
5
Entering edit mode
13.2 years ago
Kamila001 ▴ 120

Hi,

I have about more than 5 million SNPs that are present in the noncoding region of the genome.I want to see their functional effect on phenotype and as well as to categorize them (e.g. SNPs overlaping regulatory region/TF or conserved sites, splice sites, enhancer region etc.). I donot know where to start and what tools should I be using? A little bit idea that I got from reading the biostar posts (http://biostar.stackexchange.com/questions/3391/analysis-of-snps-in-gene-deserts-non-genic-regions) is that may be I should look the overlaps between the phastConsElements tables from UCSC with my SNPs.OR I could somehow extract the complete list of transcription factor binding sites over the whole genome (not sure if its possible to get any on chicken or pig genome) and then check its overlap. This is just my thoughts, may be I am on wrong track. Since I am a beginner I would appreciate related literature and your thoughts that could help me to start somewhere.(I have a background of bioinformatics but not too good in biology) :(

snp splicing transcription binding conservation • 8.4k views
ADD COMMENT
0
Entering edit mode

How many individuals do you have coverage for? (Are these 5 million SNPs from just one person? or Have you genotyped a group all at the same loci?)

ADD REPLY
0
Entering edit mode

No, these SNPs are found using the whole genome resequencing of three breeds of chicken (brioler, layer and chinese silkie) and pool of individuals from RJF.Is that what you were asking? Infact i am just analyzing that data.

ADD REPLY
0
Entering edit mode

Hi Kamila, I also come across the same question with you, can you give me some suggestion or how did you solve it finally.

ADD REPLY
4
Entering edit mode
13.2 years ago

TRANSFAC database of transcription factor binding sites (TFBS) does have some chicken TFs in there. I believe that the JASPAR database does, too. Overall, though, there is a paucity of known chicken and pig TFBSs. If time and money were not a factor, I would say that you should have a panel of expression data from many cell/tissue types from the individuals so that you can perform eQTL analysis, linking allele differences to differences in mRNA levels. But these are not easy or cheap expts to do.

In place of the eQTL discovery, I would look for regions of conservation that fall outside of exons and that contain a SNP from your list. I know that a finch genome has been completed and so this would allow you to compare to other, similar species. Not ever having worked on a bird genome, I cannot say how distant you want to go to make these genome-wide comparisons. The idea here is conserved but non-coding is likely to be regulatory and a SNP there may have a functional consequence.

Papers that come to mind are those describing multiple genomes of yeasts, or Drosophila species, or even different Arabidopsis cultivars.

ADD COMMENT
0
Entering edit mode

Thanks Larry for such a detailed answer. I did worked with TRANSFAC professional but if I am not wrong it was supporting human, mouse and rat. but may be you are talking about its current version OR you are talking about its profiles for TFs?? I will surely look at JASPAR database as well.

ADD REPLY
0
Entering edit mode

I am certain that a few TF binding motifs from chicken and avian viruses are in TRANSFAC. The following TRANSFAC/JASPAR models are for chicken: T00107, T00264, M00743, M00771, M00771, M00971, M00971, MA0098, T00114, T00115, T00267, M00640, T00128, M00983, T01437, MA0089, M00963, T00601, T01693, T00334, T01031, T01031, T01154, T01351, T01692, M00690, M00960, M00960, T00233, T00698, T01150, T01660, T00062

ADD REPLY
0
Entering edit mode

thanks Larry, if its not out of the scope of this question, could you please specify the steps/options/literature you followed to retrieve these matrixes. That would be really helpfull.

ADD REPLY
0
Entering edit mode

thanks Larry, if its not out of the scope of this question, could you please specify the steps/options/literature you followed to retrieve these matrixes. That would be really helpful.

ADD REPLY
3
Entering edit mode
13.2 years ago
Andrea_Bio ★ 2.8k

Your question says your snps are in the non coding region but also says you are interested in snps in splice sites which are in genes. So i'll just assume most of your snps aren't in genes but some could be.

If you look at gen2phen.org you will see a link called functional classifcation which gives about 10 categories by which SNPs can be annotated and lists tools for each one. This classification is based on a pipeline called 'pathogenic or not pipeline' whose url i can't remember.

The broadest SNP annotation is called a gene annotation whicn annotates snps with respect to genes and classifies them as things like intronic/upstream/downstream/intergenic/3prime_utr, 5'_utr, splice site and for exonic snps stop lost, stop gained, missense etc. I know you said your snps were in the non-coding region of the genome but you might want to know genes/transcripts they are in close proximity to and these tools will tell you that.

There are then tools that look at missense mutations to see if this could have a pathogenic effect on protein structure (polyphen and snp). If you only have intergenic snps you won't be that interested in this. I just felt it would be remiss not to mention these tools as these are the ones that dominate the literature.

You will be most interested in tools which classify snps with respect to regulatory regions such as transcription factor binding sites, promoters, RNA genes. There are tonnes of these and you will see links on the page I gave. if your snp is in a rna gene, for example, tools can tell you the effect of the snp on the rna structure. I don't know whether you are aware that some rna molecules regulate gene expression and may not be able to bind to their target site if their structure is compromised (you said your biology was weak - i hope that's not patronising). For splice sites there are tools to tell you if the snp will produce aberrant splicing.

The best tool that i have seen recently is ANNOVAR. This will let you find annotate public and novel SNPs from any species with respect to any genomic feature for which a GFF3 or VCF file exists. So you can download UCSC genome tracks for known genes, transcription factor binding sites, RNA genes etc etc and get your SNPs annotation with respect to these tracks. I've never used it but apparently its very quick and can annotate 5 million snps in 5 mins. This also has the sequence conservation information you mention in the question

Another tool is the ensembl variation API and a snp effect predictor script from ensembl which will tell you the general consequences of a SNP (the ones listed above) for public/novel snps from any species. I think annovar lets you annotate with respect to ensembl genes anyhow so you might be best with just annovar

SNPNexus is also a 'general' consequence tool.

Papers for all of these tools are available on the gen2phen website

ADD COMMENT
0
Entering edit mode

Andrea: Thanks alot for such a useful link gen2phen.org. i was unaware of this information. Your answers is like a complete lecture to my question. Thanks again. I mentioned splice sites because, very few are present in the genes as well (0.15% in the coding region, haven't calculated the splice sites SNPs) while the rest is all over the genome.

ADD REPLY
0
Entering edit mode

you're welcome. gen2phen was a godsend to me when i found it so i thought i'd pass it on.

ADD REPLY
2
Entering edit mode
13.2 years ago
Laura ★ 1.8k

Ensembl contains lots of annotation of regulatory features http://www.ensembl.org/info/docs/funcgen/index.html and the variant effect predictor will tell you if your snps overlap both with regulatory features and other annotations available through ensembl http://www.ensembl.org/tools.html

ADD COMMENT
1
Entering edit mode

to elaborate on this: the ensembl variation api will tell you the general consequences of a snp and one of these is called RegulatoryRegion. the api will only class as snp as being in a RegulatoryRegion if it falls within an ExternalFeature in the funcgen database. External features contain features imported from cisRED, miRanda and vista.

ADD REPLY
1
Entering edit mode

In the funcgen database there are 2 other tyes of feature: AnnotatedFeatures and RegulatoryFeatures. These are not currently included in the consequence prediction feature of the variation api (which is what the snp effect predictor script uses) Though of course you can look at RegulatoryFeatures and AnnotatedFeatures yourself through the funcgen database. I believe there are plans to include regulatory and annotated features in the consequence prediction soon

ADD REPLY
0
Entering edit mode

to elaborate on this: the ensembl variation api will only class as snp as being in a regulatory feature if it falls within an ExternalFeature in the funcgen database. This is cisRED, miRanda and vista. AnnotatedFeatures and RegulatoryFeatures are not currently included in the consequence prediction feature of the variation api (which is what the snp effect predictor script uses) Though of course you can look at RegulatoryFeatures and AnnotatedFeatures yourself through the funcgen database. I believe there are plans to include regulatory and annotated features in the consequence prediction soon

ADD REPLY
0
Entering edit mode

to elaborate on this: the ensembl variation api will tell you the general consequences of a snp and one of these is called RegulatoryRegion. the api will only class as snp as being in a RegulatoryRegion if it falls within an ExternalFeature in the funcgen database. This is cisRED, miRanda and vista.

ADD REPLY
0
Entering edit mode

In the funcgen database there are 2 other tyes of feature: AnnotatedFeatures and RegulatoryFeatures. These are not currently included in the consequence prediction feature of the variation api (which is what the snp effect predictor script uses) Though of course you can look at RegulatoryFeatures and AnnotatedFeatures yourself through the funcgen database. I believe there are plans to include regulatory and annotated features in the consequence prediction soon

ADD REPLY
0
Entering edit mode

to elaborate on this: the ensembl variation api will tell you the general consequences of a snp and one of these is called RegulatoryRegion. the api will only class as snp as being in a RegulatoryRegion if it falls within an ExternalFeature in the funcgen database. External features contain features imported from cisRED, miRanda and vista.

ADD REPLY

Login before adding your answer.

Traffic: 1765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6