How to separate the largest strain when it has the genome of its chromosome
1
0
Entering edit mode
6.2 years ago
Shelle ▴ 30

I have large number of fasta files of bacteria from NCBI (in the GCF format) _genomic.fna.gz, and i am planning to extract the largest strain out of fasta files. I have noticed there are some organisms which contain the genome of its some chromosomes and hence for these cases it is not enough to extract the largest strain, since I should have all chromosomes. Different files have different headers and header in the first line of several different fasta files is as below:

>NZ_LS483491.1 Staphylococcus auricularis strain NCTC12101 genome assembly, chromosome: 1

>NZ_CP012214.1 Campylobacter jejuni strain CJ088CC52, complete genome

>NZ_CP016324.1 Vibrio cholerae 2740-80 chromosome 1, complete sequence

>NC_013791.2 Bacillus pseudofirmus OF4, complete genome # this file has a complete genome and the others 
                                                                                                     #  are some complete sequences of some strains

I am completely new to sequencing. Can anyone tell me a way to extract the largest strain when I have a large number of files with different content like the situation keeping all the chromosome and on the other hand extracting the largest sequence when the file doesn't include the chromosome in the header ??

sequence genome chromosome • 1.2k views
ADD COMMENT
0
Entering edit mode

If you wanted only the complete genomes, you should have used the solution here: How to download COMPLETE bacterial genomes from NCBI based on list of names?

ADD REPLY
0
Entering edit mode
6.2 years ago
GenoMax 147k

i am planning to extract the largest strain out of fasta files.

That sentence is not making total sense but I am going to assume that you want the longest fasta sequence of the lot irrespective of the strain name. Take a look at this thread to get that information.

ADD COMMENT

Login before adding your answer.

Traffic: 809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6