How to change GCA_ to NC_ annotations (RefSeq)
1
0
Entering edit mode
22 months ago
beginner123 ▴ 30

I downloaded the data_summary.tsv file via NCBI Datasets, but I need to change the GCA_ style annotations in the file to NC_ annotations in order to create a RefSeq (NC_) reference list. Is there any way to convert RefSeq Assembly to RefSeq accession number?

NCBI RefSeq • 823 views
ADD COMMENT
0
Entering edit mode

it would be helpful if you can post what you have downloaded and what you want.

ADD REPLY
0
Entering edit mode

GCA accessions are GenBank assemblies where as corresponding GCF accessions (if they exist) would be RefSeq.

One way to convert these would be using EntrezDirect:

$ esearch -db assembly -query GCF_000266945 | elink -target nuccore | efetch -format acc
NC_018026.1
NC_018025.1
CP003361.1
CP003360.1
ADD REPLY
0
Entering edit mode

looks like we wrote the exact same thing at the same time :-) - just took a detour in investigating the assembly stats

ADD REPLY
0
Entering edit mode
22 months ago

Interesting question I assumed the assembly summary would have that - turns out it does not. Upon some trial and error it seems you can link this up in the following way using Entrez Direct:

esearch -db assembly -query GCA_009858895 | elink -target nuccore | efetch -format acc

prints:

NC_045512.2
MN908947.3

it shows both the RefSeq and the GenBank entries of the same data.

ADD COMMENT

Login before adding your answer.

Traffic: 2685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6