How to change GCA_ to NC_ annotations (RefSeq)
1
0
Entering edit mode
10 weeks ago
beginner123 ▴ 20

I downloaded the data_summary.tsv file via NCBI Datasets, but I need to change the GCA_ style annotations in the file to NC_ annotations in order to create a RefSeq (NC_) reference list. Is there any way to convert RefSeq Assembly to RefSeq accession number?

NCBI RefSeq • 357 views
0
Entering edit mode

0
Entering edit mode

GCA accessions are GenBank assemblies where as corresponding GCF accessions (if they exist) would be RefSeq.

One way to convert these would be using EntrezDirect:

\$ esearch -db assembly -query GCF_000266945 | elink -target nuccore | efetch -format acc
NC_018026.1
NC_018025.1
CP003361.1
CP003360.1

0
Entering edit mode

looks like we wrote the exact same thing at the same time :-) - just took a detour in investigating the assembly stats

0
Entering edit mode
10 weeks ago

Interesting question I assumed the assembly summary would have that - turns out it does not. Upon some trial and error it seems you can link this up in the following way using Entrez Direct:

esearch -db assembly -query GCA_009858895 | elink -target nuccore | efetch -format acc


prints:

NC_045512.2
MN908947.3


it shows both the RefSeq and the GenBank entries of the same data.