Question

Assembly Accession Number

0

Entering edit mode

7.9 years ago

lizabe ▴ 10

Hi everybody, could anybody explain me why these nucleotide entries in Genbank database don´t have an assembly accession number?

http://www.ncbi.nlm.nih.gov/nuccore/NC_023274.1?from=4015&to=4638&strand=2 http://www.ncbi.nlm.nih.gov/nuccore/NC_001735.4?from=34478&to=35101&strand=2

and this has one?

http://www.ncbi.nlm.nih.gov/nuccore/NZ_JTZL01000048.1?from=10765&to=11388

I am trying to understand this because I have to assign the assembly accession numbers to a list of nucleotide accession numbers and some of them doesn´t have this data.

Thanks!

ncbi Assembly genbank • 3.2k views

ADD COMMENT • link updated 7.9 years ago by piet ★ 1.8k • written 7.9 years ago by lizabe ▴ 10

0

Entering edit mode

First two entries are from RefSeq and are validated/curated (note the NC* accession #). The third entry is from WGS dataset. It was automatically annotated by NCBI's prokaryotic annotation pipeline (as notes indicate).

ADD REPLY • link 7.9 years ago by GenoMax 141k

0

Entering edit mode

Thank you for the answer! So, the 2 first entries are not part of the assembly data base, are they?. I have a problem here because if I want to know how many genomes of a specie there are in ncbi and I search in the assembly data base I am not taking account of the entries that are like these 2 entries. How can I get the exactly number of genomes, no matter the level of assembly or if it is curated or not, of a certain specie?

ADD REPLY • link 7.9 years ago by lizabe ▴ 10

0

Entering edit mode

Are you referring to "WGS" section of genbank as "assembly database"? RefSeq sequences are just that, references that are stable/curated.

A list of all genomes in NCBI is in this file. A similar list is available for RefSeq genomes as well.

ADD REPLY • link 7.9 years ago by GenoMax 141k

score 0 · Answer 1 · 2016-06-09

There is absolutely no need for a given sequence in Genbank to have an associated 'assembly'. NC_023274.1 was derived from KF840720.1, which is a 38 kb sequence presumably submitted 'as is' and which is also lacking biosample and bioproject links.

NZ_JTZL00000000.1 is a set of WGS sequences. WGS sequences are treated different by Genbank in many details. WGS sequences usually have a biosample, a bioproject and an 'assembly' associated. Nevertheless, several WGS sequences have been submitted without also submitting the sequencing reads used to construct them.

How can I get the exactly number of genomes, no matter the level of assembly or if it is curated or not, of a certain specie?

you have to send two queries, one for all sequences having status 'complete' and another one for WGS sequences.

"Pseudomonas aeruginosa"[Organism] AND (complete[Properties] or "wgs master"[Properties])

see also A: Efetch For Fully Sequenced Microbial Genomes?