Question

How to separate the largest strain when it has the genome of its chromosome

0

Entering edit mode

6.2 years ago

Shelle ▴ 30

I have large number of fasta files of bacteria from NCBI (in the GCF format) _genomic.fna.gz, and i am planning to extract the largest strain out of fasta files. I have noticed there are some organisms which contain the genome of its some chromosomes and hence for these cases it is not enough to extract the largest strain, since I should have all chromosomes. Different files have different headers and header in the first line of several different fasta files is as below:

>NZ_LS483491.1 Staphylococcus auricularis strain NCTC12101 genome assembly, chromosome: 1

>NZ_CP012214.1 Campylobacter jejuni strain CJ088CC52, complete genome

>NZ_CP016324.1 Vibrio cholerae 2740-80 chromosome 1, complete sequence

>NC_013791.2 Bacillus pseudofirmus OF4, complete genome # this file has a complete genome and the others 
                                                                                                     #  are some complete sequences of some strains

I am completely new to sequencing. Can anyone tell me a way to extract the largest strain when I have a large number of files with different content like the situation keeping all the chromosome and on the other hand extracting the largest sequence when the file doesn't include the chromosome in the header ??

sequence genome chromosome • 1.2k views

ADD COMMENT • link updated 5.1 years ago by Biostar 20 • written 6.2 years ago by Shelle ▴ 30

0

Entering edit mode

If you wanted only the complete genomes, you should have used the solution here: How to download COMPLETE bacterial genomes from NCBI based on list of names?

ADD REPLY • link 6.2 years ago by GenoMax 147k

score 0 · Answer 1 · 2018-09-06

0

Entering edit mode

6.2 years ago

GenoMax 147k

i am planning to extract the largest strain out of fasta files.

That sentence is not making total sense but I am going to assume that you want the longest fasta sequence of the lot irrespective of the strain name. Take a look at this thread to get that information.

ADD COMMENT • link 6.2 years ago by GenoMax 147k