Get Fasta file of a genome using only contig name?
1
0
Entering edit mode
21 months ago

Hello,

I was given a list of 2000+ unique contig names and I need to find out which bacterial genomes the contigs belong to, and download the whole genome of that bacteria for as many as I can. I can't think of any way to expedite this process except to search on NCBI each genome. Is there a faster way to do this? I understand if this is a weird question to ask.

genome sequence • 392 views
0
Entering edit mode

Please post an example or two of the names.

0
Entering edit mode
BBFO01000001.1
CP022915.1


Here are some examples. They are Genbank handles,and I would like the genomes in Fasta format. I'm thinking I can use wget and write a script that uses wget for each of the genomes in my list, since the contig names I just posted are part of the website address?

1
Entering edit mode
21 months ago
wget -O out.fa   "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=BBFO01000001.1,CP022915.1&rettype=fasta&retmode=text"

0
Entering edit mode

Nice, I was just typing up my edit on how I think I could do it, thanks for confirming

0
Entering edit mode

I was able to download the fasta of all the contigs I was looking for! However, is it also possible to download whole genome assembly directories of these bacterial strains given the contig names I posted earlier?

0
Entering edit mode

Take a look at this. I doubt you need full directories.