How can I download a fasta file of all homologs to a given protein?
3
0
Entering edit mode
6.0 years ago
A248 ▴ 30

I have the amino acid sequence of a gene of interest. I want to identify all the different homologs in different species, and assemble it in a single FASTA file format. Which I then want to input into a sequence similarity network (SSN) tool, to find how the different homologs from different species cluster.

My problem is: how do I get this list of homologs into one FASTA file? I tried two approaches. One is find the protein on Kegg database, then I click on 'Orthologs'. This gives all the homologs in the KEGG database (about 2500). But I don't see a way to download the amino acid sequences in one go, unless I click each homolog indvidually and copy the aa sequence.

The second option is to do a BLASTP usign RefSeq as the search database. This gives me only a limited number of hits (restricted to 100), and furthermore, I still can't figure out how to download the FASTA sequences of the hits in one go.

Can someone please offer some help? Thanks in advance!

alignment sequence • 2.7k views
ADD COMMENT
0
Entering edit mode

what kind of species are we talking about here? if tit's plants (or "related") you should have a look here: PLAZA ; the whole purpose of this resource is exactly homology, gene families etc ...

ADD REPLY
1
Entering edit mode
6.0 years ago
swati.6783 ▴ 10

Hi Using BLASTP you can get your homologs. Hits by default are restricted to 100, you can increase this to upto 20000 using the web interface of BLASTP where u can click on advanced parameters to get the dropdown menu for adjusting the max number of hits.

On the result page, just select all and there will be options to download the complete fasta sequences as well as you can download the aligned sequences if you want.

Hope it helps.

Thanks Swati

ADD COMMENT
0
Entering edit mode
6.0 years ago
Joe 21k

Is this sufficient for what you want to do:

If you have a gene name, you can find all the proteins that are know to match it in the NCBI Protein database, and all can be downloaded as Fasta CDS files.

enter image description here

ADD COMMENT
0
Entering edit mode
6.0 years ago
GenoMax 141k

Have you looked at NCBI's homologene database? Ensembl also has a similar solutions available. If your protein is a standard accession then you should be able to find it in these databases.

ADD COMMENT

Login before adding your answer.

Traffic: 1675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6