Question

How To Assess The Effect Of Snps Based On Multiple Transcripts ?

6

Entering edit mode

13.5 years ago

Khader Shameer 18k

I am currently looking at a bunch of SNPs from a genome wide association study. Couple of these SNPs are non-synonymous and present in the coding region. For example, in case of a particular SNP, we noticed that the gene codes for different transcripts and the SNP is present only in 5 out of total of 10 transcript including the longest transcript.

From a biological view point, what will be the functional implication / functional effect of such SNP which may or may not express in the protein product due to alternate splicing ?

While discussing the results, is it alright to explain the results using the longest transcript ?

What will be the best way to assess the effect of such a SNP, given that the gene with that particular encodes multiple transcript due alternate exons and overlapping splice sites ?

Is there any tool which can predict functional effect of such SNPs ?

Looking forward your thoughtful insights. Thanks in advance.

snp transcript variation • 5.6k views

ADD COMMENT • link updated 13.5 years ago by Larry_Parnell 16k • written 13.5 years ago by Khader Shameer 18k

score 5 · Answer 1 · 2010-10-19

5

Entering edit mode

13.5 years ago

Pierre Lindenbaum 161k

Polyphen has been recently updated and it now uses all the transcripts for its batch queries (ucsc hg18 knownGene ): http://genetics.bwh.harvard.edu/pph2/bgi.shtml

On my side, When I'm looking for the functional effect of one SNP I look at all the distinct [?]genes[?] transcripts.

ADD COMMENT • link 13.5 years ago by Pierre Lindenbaum 161k

1

Entering edit mode

PolyPhen installation steps didn't proceeded beyond the makeblastdb was running for 2 days and still no sign of completing the process of formatting. Finally I got the results via the web interface using rs numbers.

ADD REPLY • link 13.5 years ago by Khader Shameer 18k

0

Entering edit mode

Thanks for the note on Polyphen. I am still at the step 7 of PPH2 installation, makeblastdb terminated in between and I' am running it again. I hope the server will be up by tomorrow, if my installation fails, I will try at the Polyphen server.

ADD REPLY • link 13.5 years ago by Khader Shameer 18k

0

Entering edit mode

ok Khader, thanks for telling me

ADD REPLY • link 13.5 years ago by Pierre Lindenbaum 161k

score 5 · Answer 2 · 2010-10-19

Unfortunately, when alternative splicing is in play, there is no nice 1-to-1 mapping of gene to transcript or gene to protein. Biochemists often use the term "ensemble" to refer to a biomolecule that exists in multiple states (eg, the intermediate states between when a protein is free and when it is completely bound to a strand of DNA).

I don't think there is any justification to look at any single transcript, whether it's the shortest or longest or in between. To borrow the term from biochemistry, you need to look at the "ensemble" of transcript (and protein?) states to really understand the effect of the SNP.

As Pierre suggested, I would include all of the distinct transcripts, but I would also maintain some record of which transcripts belong to the same gene. That way once you have looked at the effect of the SNP on each transcript/protein, you can look at the overall effect on the gene. It looks like you already have a good start on the kind of questions to ask.

What percentage of this gene's transcripts contain the SNP?
What percentage of this gene's protein sequences are changed as a result of this SNP?
What percentage of this gene's protein sequences are changed in binding domains (or other functionally important areas) as a result of this SNP?

Hopefully this gives you something to think about. These ideas are still kinda half-baked, I haven't put this into practice myself but these are some of the thoughts that I had.

score 4 · Answer 3 · 2010-10-19

4

Entering edit mode

13.5 years ago

Mary 11k

"From a biological view point" I would say I would want to know about the spatial/temporal expression patterns of the transcripts/protein that carry the alteration. That said, I can imagine in some types of proteins it wouldn't matter much. In others if it was in a domain with key functionality, or impacted folding issues, it could matter a lot.

"longest transcript" Yeah, I would think it was fine to describe it in the context of the longest transcript if you are trying to orient people. But I might consider choosing the UCSC "canonical" or maybe a reference one from CCDS or Gencode or something? Or the way MapViewer or SeatteSNPs does it so it compresses/flattens all transcripts to a reference diagram--knowing that may not really represent a version that is actually used. As long as it's clear which way you decided to go and there's a reason for it, it would make sense. But I guess I would want to know what is the most common transcript too--longest is not necessarily the most common--could even be rare. A gene I worked on in grad school was like that. And we also had a different long transcript version that was long because of a 3' UTR--not a coding piece.

I think if I was asked to consider a gene with splice variants I'd want a diagram like the GVS gives, with SNPs and transcripts--and you can assess for yourself which SNPs would impact which transcripts: gene at GVS with alternate transcripts

"assess the effect" would bring me back to tissue/cell type/time point, possible interaction partners, etc.

Tools: this SIFT page (http://sift.jcvi.org/ ) also links to some other tools on the left side that I'd try. Maybe PMut too (http://mmb.pcb.ub.es/PMut/ )

Is that the sort of stuff you were looking for? Or you were just looking for computational answers? My PhD was on a muscle splice variant in a cell biology lab and I think like a cell biologist on that....

ADD COMMENT • link 13.5 years ago by Mary 11k

0

Entering edit mode

Good point about different tissue/cell types, time points, and interactions. All things to think about when it comes time to explain "the effect of the SNP." Often capturing the entire ensemble just isn't possible, and we need to be explicit about some of these things.

ADD REPLY • link 13.5 years ago by Daniel Standage 4.1k

0

Entering edit mode

Thanks for your exciting thoughts. I am wondering if there is a way (preferably computational method)to find what could be the most common transcript of a gene ? Thanks a lot for the suggestion on tools, I will try them and provide my feedback.

ADD REPLY • link 13.5 years ago by Khader Shameer 18k

0

Entering edit mode

@Khader: If it was human (or some model org with good expression data sets) I would personally look through Affy chips to see if different pieces of my transcripts had been used as probes. You may (or may not) get lucky. If so, look at the levels. GEO probably for that. (have to continue my answer in 2 more comments due to size constraints in this reply...)

ADD REPLY • link 13.5 years ago by Mary 11k

0

Entering edit mode

@Khader: I would also look through the EST db collections and see if any certain patterns or numbers emerged with certain tissues. However, EST annotations are not great. Lots of tissue = N/A. However, there are good brain sets, cardiac sets, etc, that might offer insights. Lots of ESTs that have a certain exon would make me pleased. At least that would be a hunting license. But it wouldn't be definitive.

ADD REPLY • link 13.5 years ago by Mary 11k

0

Entering edit mode

@Khader: But mostly I'd be tempted to run some Northern blots (for assessing the size of transcripts that hyb w/ my exon probes), or do some PCR with appropriate primers and various tissues (to see which tissues do/don't have my exons). One time I used a summer student for this sort of thing :)

ADD REPLY • link 13.5 years ago by Mary 11k

0

Entering edit mode

Thanks a lot Mary. I will try my luck with GEO and dbEST first. The tools GVC and PMut looks very interesting.

ADD REPLY • link 13.5 years ago by Khader Shameer 18k

0

Entering edit mode

This particular gene is present in multiple tissues, but I am looking at the one in erythrocytes - so I can't pursue a search via mRNA. Currently I am checking the literature related to the erythrocytes to see which copy of transcript is being used in wet lab experiments.

ADD REPLY • link 13.5 years ago by Khader Shameer 18k

score 2 · Answer 4 · 2010-10-22

I'd use the exon containing the SNP with the GWAS hit as a query in a BLAST search against the EST database. Look at those results carefully (i.e., source tissue of the mRNA that went into making the EST library) to get an idea of where that exon (and its mRNA isoforms) are expressed. What if your GWAS is for a brain/neurological phenotype, but you have no EST evidence that this exon is expressed in brain or nerve tissue or neuronal cells? That may be OK, but may necessitate reviewing how that phenotype arises. (An example is the emerging role of bone and osteoblasts in obesity.)

I'd use SIFT and Polyphen to consider how detrimental the change in protein sequence is. I also look at Pfam to see how well or poorly each allele is accepted within the functional domain. I then run a protein structure predictor to get some idea of how much the minor (or disease-associating) allele affects secondary or tertiary structure. I like JPRED for this.

Lastly, there is also the possibility for a SNP that is in an exon, but close to a splice acceptor to reside within an ESE - exon splice enhancer. I know there is an ESE predictor out there, but cannot recall which one I've seen presented at conference.

In short, I would not be narrow in focus, but use a suite of tools to consider a range of functional possibilities.