Entering edit mode
6.0 years ago
Joel Wallenius
▴
210
Hello!
I have a list of RefSeq accession numbers of mRNA transcripts, all from homo sapiens, and would like to, for each transcript, get the corresponding transcript (and its sequence) from a number of related species like mouse rat cow monkey etc.
I can't imagine that there isn't already a database somewhere for this, but I've spent hours googling and found only ALMOST what I want, like InParanoid, OrthoDB, Ensembl...
Surely many others before me have wished to do something like this? What am I missing?
Thanks in advance and merry weekend!
Joel
Using NCBI eutils (example uses BRCA2). Following is complicated (perhaps un-necessarily) but it will get you the sequence. Unfortunately it will retrieve sequences of individual exons (with just generic fasta header).
I will think about alternate ways but someone else may come through in the meantime with an answer.
Here is another version. Same limitations.
I am completely unfamiliar with these programs. Like you say it seems very complicated. I was thinking that a question such as mine would have been asked many times before. Why would the answer be so complex, then? :/
There are multiple sources of getting genome alignments for mulitple genomes. UCSC has pair-wise and multiple sequence alignments available for multiple genomes. And probably what is more useful, homologene alignments for proteins (brca2 example).
Sadly I can't rely on the protein alignments though I know they are better suited for homology in most cases. I must go by the transcripts since what I'm interested in is the exact conservation of a base-pair sequence motif.
I'm starting to suspect that I shouldn't worry too much about finding the true orthologues, and instead just take the first transcript listed at NCBI for each gene... tedious though!