Question: Mapping RefSeq accession to assembly ID
0
gravatar for lironyoffe
3.9 years ago by
lironyoffe0
lironyoffe0 wrote:

Hi,

Is there a way to map RefSeq accession to assembly ID (I need to do it for all the bacteria in RefSeq FTP site) ?

For example: NZ_KN150745.1 -> GCF_001640985.1

Thanks

refseq assembly • 1.5k views
ADD COMMENTlink modified 3.9 years ago by 5heikki9.0k • written 3.9 years ago by lironyoffe0
3
gravatar for Sej Modha
3.9 years ago by
Sej Modha4.7k
Glasgow, UK
Sej Modha4.7k wrote:

Unix eutils solution would be:

elink -db nuccore -id "NZ_KN150745.1" -target assembly|esummary|xtract -pattern DocumentSummary -element AssemblyAccession
ADD COMMENTlink written 3.9 years ago by Sej Modha4.7k
0
gravatar for 5heikki
3.9 years ago by
5heikki9.0k
Finland
5heikki9.0k wrote:

The fastest solution besides parsing from the sequence files that you might have already downloaded would be to download the *assembly_report.txt files and parse from there,

e.g.

awk 'BEGIN{FS="\t"}{if(!/^#/){print $7}}' GCF_000754995.1_PVA_assembly_report.txt
NZ_KN150745.1
NZ_KN150746.1
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by 5heikki9.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1273 users visited in the last hour