Question: Download genomes within a given GC content interval
0
gravatar for genomes_and_MGEs
10 days ago by
genomes_and_MGEs0 wrote:

Hey guys,

Does anyone have a clue on how to download only complete genomes with a given GC content from NCBI? Let's say, download all complete genomes that have a GC content from 40 to 50. Thank you!

sequence assembly genome • 123 views
ADD COMMENTlink modified 10 days ago by genomax68k • written 10 days ago by genomes_and_MGEs0
4
gravatar for genomax
10 days ago by
genomax68k
United States
genomax68k wrote:

You can find genome reports for various organisms from NCBI here.

Let us get the prokaryotic genome report.

If you parse this file you can get those genomes where GC% is between 40 and 50:

$ awk -F '\t' '{if ($8 >= 40 && $8 <= 50) print $1,"\t",$21}' prokaryotes.txt | head -5
Yersinia pestis CO92     ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/009/065/GCA_000009065.1_ASM906v1
Tropheryma whipplei str. Twist   ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/007/485/GCA_000007485.1_ASM748v1
Actinobacillus pleuropneumoniae serovar 5b str. L20      ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/015/885/GCA_000015885.1_ASM1588v1
Chlamydia pneumoniae CWL029      ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/008/745/GCA_000008745.1_ASM874v1
Vibrio vulnificus        ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/215/135/GCA_002215135.1_ASM221513v1

In each of those directories you can find a *.fna.gz file with the genome sequence.

This variation should get you all the way to a downloadable URLs:

$ awk -F '/' '{print $temp"/"$10"_genomic.fna.gz"}' <(awk -F '\t' '{if ($8 >= 40 && $8 <= 50) print $21}' prokaryotes.txt; temp=$0) | head -5
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/009/065/GCA_000009065.1_ASM906v1/GCA_000009065.1_ASM906v1_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/007/485/GCA_000007485.1_ASM748v1/GCA_000007485.1_ASM748v1_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/015/885/GCA_000015885.1_ASM1588v1/GCA_000015885.1_ASM1588v1_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/008/745/GCA_000008745.1_ASM874v1/GCA_000008745.1_ASM874v1_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/215/135/GCA_002215135.1_ASM221513v1/GCA_002215135.1_ASM221513v1_genomic.fna.gz
ADD COMMENTlink modified 10 days ago • written 10 days ago by genomax68k

Thanks, really appreciate that!

ADD REPLYlink written 10 days ago by genomes_and_MGEs0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 891 users visited in the last hour