Question: How can I download a fasta file of all homologs to a given protein?
gravatar for A248
2.3 years ago by
A24810 wrote:

I have the amino acid sequence of a gene of interest. I want to identify all the different homologs in different species, and assemble it in a single FASTA file format. Which I then want to input into a sequence similarity network (SSN) tool, to find how the different homologs from different species cluster.

My problem is: how do I get this list of homologs into one FASTA file? I tried two approaches. One is find the protein on Kegg database, then I click on 'Orthologs'. This gives all the homologs in the KEGG database (about 2500). But I don't see a way to download the amino acid sequences in one go, unless I click each homolog indvidually and copy the aa sequence.

The second option is to do a BLASTP usign RefSeq as the search database. This gives me only a limited number of hits (restricted to 100), and furthermore, I still can't figure out how to download the FASTA sequences of the hits in one go.

Can someone please offer some help? Thanks in advance!

sequence alignment • 1.2k views
ADD COMMENTlink modified 2.3 years ago by genomax87k • written 2.3 years ago by A24810

what kind of species are we talking about here? if tit's plants (or "related") you should have a look here: PLAZA ; the whole purpose of this resource is exactly homology, gene families etc ...

ADD REPLYlink written 2.3 years ago by lieven.sterck8.2k
gravatar for swati.6783
2.3 years ago by
swati.678310 wrote:

Hi Using BLASTP you can get your homologs. Hits by default are restricted to 100, you can increase this to upto 20000 using the web interface of BLASTP where u can click on advanced parameters to get the dropdown menu for adjusting the max number of hits.

On the result page, just select all and there will be options to download the complete fasta sequences as well as you can download the aligned sequences if you want.

Hope it helps.

Thanks Swati

ADD COMMENTlink written 2.3 years ago by swati.678310
gravatar for Joe
2.3 years ago by
United Kingdom
Joe17k wrote:

Is this sufficient for what you want to do:

If you have a gene name, you can find all the proteins that are know to match it in the NCBI Protein database, and all can be downloaded as Fasta CDS files.

enter image description here

ADD COMMENTlink written 2.3 years ago by Joe17k
gravatar for genomax
2.3 years ago by
United States
genomax87k wrote:

Have you looked at NCBI's homologene database? Ensembl also has a similar solutions available. If your protein is a standard accession then you should be able to find it in these databases.

ADD COMMENTlink written 2.3 years ago by genomax87k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1080 users visited in the last hour