Question: Merge "NCBI species" data within Ensembl alignment
gravatar for tlorin
3.0 years ago by
tlorin250 wrote:

Apologies if this is a really naive question, but I cannot figure out how to do this easily. Here is a related post regarding the best method to find orthologous genes of a species.

Let's say I have a protein alignment downloaded from Ensembl (coming, for instance, from this Ensembl tree).

This gene is present in some other "NCBI species" that I would like to include in my tree (for instance, Stegastes partitus, with available genome and present in the NCBI database but NOT in the Ensembl database). Indeed, if I manually blastp asip protein sequence of D. rerio (extracted from my Ensembl multifasta protein alignment) onto nr database parsed for S. partitus, I find this sequence, corresponding to the first blast hit. Perfect! And I can manually append it to my initial protein tree.

Where the problem starts is that I don't have one gene and one NCBI species but many of them (let's say p genes and n NCBI species). I already have an Ensembl protein multifasta file for each of my p genes.

My question is: is there an easy way to append to each of my p multifasta files the corresponding homologous protein sequence(s) of the n "NCBI species"?

Thanks for any insight!

phylogeny blast ensembl ncbi • 799 views
ADD COMMENTlink modified 2.8 years ago by Biostar ♦♦ 20 • written 3.0 years ago by tlorin250

Since no one has said anything I will take a stab.

I don't think it would be possible to easily script what you are asking for. There are decisions that need to be made about what to select (from a different site/database) and then add that information to a second site.

ADD REPLYlink written 3.0 years ago by genomax70k

Like genomax said, this isn't trivial.

What you may want to do is consider working the other way, take your known aisp example (e.g. zebrafish) and blast against some list of species you are interested in. I'm picturing something where you would have a list of taxonomy IDs for species you're interested in, then blast your reference against the NR database filtered against each species. If I recall correctly, you can setup the blast output format to include the sequence of the hit.

After that, you'd have to reconstruct the tree, which is a whole different issue. I'm not sure how you are manually adding things to your tree.

ADD REPLYlink written 3.0 years ago by pld4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1043 users visited in the last hour