9.1 years ago by
There are multiple files for strains because they represent the main chromosome and any plasmids found in the strain. For instance *E. coli* 0157 has a file representing the main chromosome (NC_011353), and 2 plasmids (p0157 and pEC4115).
The plasmids are critical to defining the strain and can carry genes that provide vital functions to the bacteria (for instance, virulence genes are often carried on plasmids), so whether to include them in your analysis depends on your use case, but I would usually recommend it. Unless you need to be able to tell the proteins encoded by the plasmids apart from those from the main chromosome, concatenating them into a single FASTA file probably makes the most sense.