Discrepancy in Number of SRA between NCBI Website and BigQuery service (SQL request)
Entering edit mode
6 weeks ago
marie.harmel ▴ 10


I recently came across an inconsistency between the number of Sequence Read Archive (SRA) datasets reported on the NCBI website and the count obtained through a SQL query on BigQuery.

As of February 2024, the NCBI website displays a total of 27,102,173 SRA available. ncbi_sra.

However, when running the following SQL query on BigQuery:

SELECT DISTINCT m.acc, m.sample_acc, m.biosample, m.sra_study, m.bioproject 
FROM `nih-sra-datastore.sra.metadata` as m,
`nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax 
WHERE m.acc=tax.acc and m.bioproject IS NOT NULL 
ORDER BY m.bioproject, m.sra_study, m.biosample, m.sample_acc

I obtain 25.636.505 SRA.

I am curious to know if this difference in numbers could be attributed to the timing of updates between the NCBI databases on BigQuery and those accessible directly through the NCBI website.

Thank you in advance for your time and assistance.

NCBI SQL BigQuery SRA • 132 views

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6