How to change GCA_ to NC_ annotations (RefSeq)
Entering edit mode
10 weeks ago
beginner123 ▴ 20

I downloaded the data_summary.tsv file via NCBI Datasets, but I need to change the GCA_ style annotations in the file to NC_ annotations in order to create a RefSeq (NC_) reference list. Is there any way to convert RefSeq Assembly to RefSeq accession number?

GCA accessions are GenBank assemblies where as corresponding GCF accessions (if they exist) would be RefSeq.

One way to convert these would be using EntrezDirect:

\$ esearch -db assembly -query GCF_000266945 | elink -target nuccore | efetch -format acc
NC_018026.1
NC_018025.1
CP003361.1
CP003360.1

looks like we wrote the exact same thing at the same time :-) - just took a detour in investigating the assembly stats

10 weeks ago

Interesting question I assumed the assembly summary would have that - turns out it does not. Upon some trial and error it seems you can link this up in the following way using Entrez Direct:

esearch -db assembly -query GCA_009858895 | elink -target nuccore | efetch -format acc


prints:

NC_045512.2
MN908947.3


it shows both the RefSeq and the GenBank entries of the same data.