Assembly Accession Number
1
0
Entering edit mode
7.9 years ago
lizabe ▴ 10

Hi everybody, could anybody explain me why these nucleotide entries in Genbank database don´t have an assembly accession number?

http://www.ncbi.nlm.nih.gov/nuccore/NC_023274.1?from=4015&to=4638&strand=2 http://www.ncbi.nlm.nih.gov/nuccore/NC_001735.4?from=34478&to=35101&strand=2

and this has one?

http://www.ncbi.nlm.nih.gov/nuccore/NZ_JTZL01000048.1?from=10765&to=11388

I am trying to understand this because I have to assign the assembly accession numbers to a list of nucleotide accession numbers and some of them doesn´t have this data.

Thanks!

ncbi Assembly genbank • 3.2k views
ADD COMMENT
0
Entering edit mode

First two entries are from RefSeq and are validated/curated (note the NC* accession #). The third entry is from WGS dataset. It was automatically annotated by NCBI's prokaryotic annotation pipeline (as notes indicate).

ADD REPLY
0
Entering edit mode

Thank you for the answer! So, the 2 first entries are not part of the assembly data base, are they?. I have a problem here because if I want to know how many genomes of a specie there are in ncbi and I search in the assembly data base I am not taking account of the entries that are like these 2 entries. How can I get the exactly number of genomes, no matter the level of assembly or if it is curated or not, of a certain specie?

ADD REPLY
0
Entering edit mode

Are you referring to "WGS" section of genbank as "assembly database"? RefSeq sequences are just that, references that are stable/curated.

A list of all genomes in NCBI is in this file. A similar list is available for RefSeq genomes as well.

ADD REPLY
0
Entering edit mode
7.9 years ago
piet ★ 1.8k

There is absolutely no need for a given sequence in Genbank to have an associated 'assembly'. NC_023274.1 was derived from KF840720.1, which is a 38 kb sequence presumably submitted 'as is' and which is also lacking biosample and bioproject links.

NZ_JTZL00000000.1 is a set of WGS sequences. WGS sequences are treated different by Genbank in many details. WGS sequences usually have a biosample, a bioproject and an 'assembly' associated. Nevertheless, several WGS sequences have been submitted without also submitting the sequencing reads used to construct them.

How can I get the exactly number of genomes, no matter the level of assembly or if it is curated or not, of a certain specie?

you have to send two queries, one for all sequences having status 'complete' and another one for WGS sequences.

"Pseudomonas aeruginosa"[Organism] AND (complete[Properties] or "wgs master"[Properties])

see also A: Efetch For Fully Sequenced Microbial Genomes?

ADD COMMENT

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6