Choosing From Several .Faa Files In The Ncbi Ftp Server
2
5
Entering edit mode
13.2 years ago
Alf ▴ 490

Hi everybody.

I am trying to download some bacterial genomes (concretely the amino-acid FASTA files) from the NCBI FTP server (ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/ ). It seems that, for many of the organisms, several ".faa" files are given.

Which one should I choose? Should I use them all, concatenated? Any idea of this?

Thank you :)

ncbi fasta bacteria • 4.5k views
ADD COMMENT
6
Entering edit mode
13.2 years ago

There are multiple files for strains because they represent the main chromosome and any plasmids found in the strain. For instance *E. coli* 0157 has a file representing the main chromosome (NC_011353), and 2 plasmids (p0157 and pEC4115).

The plasmids are critical to defining the strain and can carry genes that provide vital functions to the bacteria (for instance, virulence genes are often carried on plasmids), so whether to include them in your analysis depends on your use case, but I would usually recommend it. Unless you need to be able to tell the proteins encoded by the plasmids apart from those from the main chromosome, concatenating them into a single FASTA file probably makes the most sense.

ADD COMMENT
1
Entering edit mode
13.2 years ago

When there are multiple .faa files, it is typically because one contains the chromosome whereas the others are plasmids. In this case it would make sense to either use all or only the largest file (that should be the chromosome). However, be careful if you do the latter since there is a small number of bacteria that have more than one chromosome.

ADD COMMENT

Login before adding your answer.

Traffic: 2560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6