Question: How can I download a fasta file of all homologs to a given protein?
gravatar for A248
3 months ago by
A2480 wrote:

I have the amino acid sequence of a gene of interest. I want to identify all the different homologs in different species, and assemble it in a single FASTA file format. Which I then want to input into a sequence similarity network (SSN) tool, to find how the different homologs from different species cluster.

My problem is: how do I get this list of homologs into one FASTA file? I tried two approaches. One is find the protein on Kegg database, then I click on 'Orthologs'. This gives all the homologs in the KEGG database (about 2500). But I don't see a way to download the amino acid sequences in one go, unless I click each homolog indvidually and copy the aa sequence.

The second option is to do a BLASTP usign RefSeq as the search database. This gives me only a limited number of hits (restricted to 100), and furthermore, I still can't figure out how to download the FASTA sequences of the hits in one go.

Can someone please offer some help? Thanks in advance!

sequence alignment • 171 views
ADD COMMENTlink modified 3 months ago by genomax54k • written 3 months ago by A2480

what kind of species are we talking about here? if tit's plants (or "related") you should have a look here: PLAZA ; the whole purpose of this resource is exactly homology, gene families etc ...

ADD REPLYlink written 3 months ago by lieven.sterck2.1k
gravatar for swati.6783
3 months ago by
swati.678310 wrote:

Hi Using BLASTP you can get your homologs. Hits by default are restricted to 100, you can increase this to upto 20000 using the web interface of BLASTP where u can click on advanced parameters to get the dropdown menu for adjusting the max number of hits.

On the result page, just select all and there will be options to download the complete fasta sequences as well as you can download the aligned sequences if you want.

Hope it helps.

Thanks Swati

ADD COMMENTlink written 3 months ago by swati.678310
gravatar for jrj.healey
3 months ago by
United Kingdom
jrj.healey5.9k wrote:

Is this sufficient for what you want to do:

If you have a gene name, you can find all the proteins that are know to match it in the NCBI Protein database, and all can be downloaded as Fasta CDS files.

enter image description here

ADD COMMENTlink written 3 months ago by jrj.healey5.9k
gravatar for genomax
3 months ago by
United States
genomax54k wrote:

Have you looked at NCBI's homologene database? Ensembl also has a similar solutions available. If your protein is a standard accession then you should be able to find it in these databases.

ADD COMMENTlink written 3 months ago by genomax54k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 657 users visited in the last hour