Investigating Snps Of Unknown Significance (Uv)
4
4
Entering edit mode
10.7 years ago
Andrea_Bio ★ 2.7k

Hello

I have available to be the entire set of SNPs of quite a few cows (don't know the precise number at the moment). I am investigating the consequences of these SNPs. As a first step I have looked at the SNPs in exons and found their consequences using the ensembl consequence feature. I have then looked at SNPs that were identified as non-synonymous to see their effect on protein structure using polyphen.

I would like to see if I can find out anything useful about the other SNPs of which there are hundreds of thousands.

Here are some of my ideas. I was wondering if I was missing anything else obvious

1) Use tools to investigate the consequences of SNPs in non-essential splice sites e.g. SpliceSiteFinder, MaxEntScan. I appreciate that the results of these programs can be inaccurate but I believe you can get 80-90% accurancy with splice sites close to the exon/intron boundaries

2) Use tools to scan cattle genome for potential intron splicing regulatory elements (enhancers and silencers) and look for SNPs in those regions. I believe tools to look for these types of regulatory regions are much less reliable than tools to look for splice sites as the 'consensus' sequence is so degenerate.

3) Identify SNPs in known RNA genes in ensembl cattle database. Also Look for any putative RNA elements in the cattle genome using RNA prediction tools, and identify SNPs here.

What are the best tools to look for RNA genes (though I have another question open about this) Are there any tools to predict the consequences of SNPs in miRNA?

4) Look for transcription factor binding sites in the genome and identify SNPs here. I'm not sure about this one and haven't really given it much thought. If anyone reading this question knows any opportunities/caveats off the top of their head it would certainly give me a head start but I admit this is just a cursory idea to me at the moment.

Are there any other major regulatory elements that I am missing. Molecular biology has changed so much that I don't know half of the things that exist anymore. No-one had heard of miRNA when I did my degree!!

I imagine there are lots of different DNA regulatory features. I just want to make sure I haven't missed the obvious ones. It would be impossible to cover all of the minor ones.

Many thanks in advance to anyone who answers. This forum is a really great resource. There seems to be a core of users who answer my questions and I appreciate every bit of input you give me.

Andrea

snp splicing rna • 4.8k views
9
Entering edit mode
10.7 years ago

This is a long and involved question and would cover 2 or 3 of the lectures I have given in the past on this very topic.

For microRNAs - try the Miranda database at www.miranda.org Test both alleles to see what the allele-specific effect is. [Edit added on 1 Feb 2011 - see the recent paper by Brest, Hofman et al on IRGM expression altered by a synonymous SNP with directs allele-specific expression of IRGM via microRNA 196 with regard to Crohn's disease.]

A SNP may alter the free energy of folding the mRNA and so affect stability. The Vienna RNA Package is good to use for this.

SNPs that are upstream of exon 1, in exon 1 and in intron 1 can alter transcription factor binding sites. How far upstream of the exon 1 TSS do you look? That is a debatable question. We go at least 10 kbp, sometimes further. (If anyone argues that is too far, point them to the examples of LCT and lactose tolerance - those SNPs are 10 to 11 kbp upstream of the LCT gene and within MCM6.) We use MAPPER to identify putative TFBSs. Make sure you use a tool that employs TRANSFAC and JASPAR profiles of the binding sites.

Don't forget about exon splice enhancers. There are tools to identify putative ESEs. These reside in exons near splice sites.

Synonymous SNPs may alter translation kinetics with a consequence on protein structure. So, look at codon usage for your synonymous SNPs. Other SNPs in the coding region that alter an amino acid may also alter the score of a conserved domain or Pfam profile. So, I run Pfam to see how the different alleles affect the Pfam score. SIFT is also a good tool for the non-synonymous SNPs.

If you have a SNP that lands between genes X and Y, use synteny to your advantage. Most bovine genes are in the same order in human and the human genome is much more deeply and richly annotated. So, look to see what is found between those two genes in the human genome - a lincRNA? a family of microRNAs? a gene encoding a hypothetical protein that may also exist in the bovine genome? any other small RNA genes?

OK, that should get you started. Good luck!

So, 3 weeks later (3 Dec 2010), there is more to add to my reply. See the table entitled "Computational methods to characterize risk loci function method description" (currently Table 2) in the WikiGenes article "Principles for the post-GWAS functional characterisation of risk loci."

0
Entering edit mode

that is a wonderful answer as ever Larry_Parnell

0
Entering edit mode

synteny is a great idea, i hadn't thought of that

0
Entering edit mode

You should also look at Overrepresentation Analysis For Microrna Binding Sites In Utrs? for answers here on microRNAs and 3'-UTRs.

0
Entering edit mode

hi, looked at that thanks. it seems to be that there aren't any tools to perform ab initio prediction of rna genes in a genome, unless i am mistaken

0
Entering edit mode

I am certain that some groups have developed tools to predict microRNA genes. Other RNA genes may be found in EST collections that do not match an ENSEMBL or RefSeq protein-coding gene.

3
Entering edit mode
10.7 years ago

Are there any other major regulatory elements that I am missing. Molecular biology has changed so much that I don't know half of the things that exist anymore. No-one had heard of miRNA when I did my degree!!

You can get a fair about information about the recent understanding of non-coding genome from this review article "Annotating non-coding regions of the genome". I would also recommend you to take a look at some of the excellent suggestions for my recent question by Sean, Larry and GWW about considering Hi-C, ENCODE and other new datasets to analyze non-coding genome.

2
Entering edit mode
10.7 years ago

And...

I neglected to add SNPs altering CpG islands. I have heard in a conference talk that one change, C > T for example, can affect methylation patterns of a CpG island of considerable length. I do not have a reference at hand, but one can use a CpG island predictor - or these are already mapped in some genome databases - to look at SNPs mapping within.

At the same time, I have also heard a professor whose lab looks at this sort of thing that a single SNP within a CpG island is not sufficient to significantly alter the downstream effect of the methylation status of that sequence segment. So, differing views here, which means you explore the SNP-in-CpG-island scenario carefully. (added 31 Mar 2011)

1
Entering edit mode

ensembl has cpg islands predicted in its simple feature table

for cow

mysql -hensembldb.ensembl.org -uanonymous -P5306 -D bos_taurus_core_60_4i -e "select simple_feature.* from simple_feature, analysis where analysis.analysis_id = simple_feature.analysis_id and analysis.logic_name = 'cpg'";


The ensembl variant effect predictor also says if a snp is found in a regulatory element which should be annotated in the fungen database