Discrepancy in Number of SRA between NCBI Website and BigQuery service (SQL request)

0

Entering edit mode

16 months ago

marie.harmel ▴ 10

Hello,

I recently came across an inconsistency between the number of Sequence Read Archive (SRA) datasets reported on the NCBI website and the count obtained through a SQL query on BigQuery.

As of February 2024, the NCBI website displays a total of 27,102,173 SRA available. ncbi_sra .

However, when running the following SQL query on BigQuery:

SELECT DISTINCT m.acc, m.sample_acc, m.biosample, m.sra_study, m.bioproject 
FROM `nih-sra-datastore.sra.metadata` as m,
`nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax 
WHERE m.acc=tax.acc and m.bioproject IS NOT NULL 
ORDER BY m.bioproject, m.sra_study, m.biosample, m.sample_acc

I obtain 25.636.505 SRA.

I am curious to know if this difference in numbers could be attributed to the timing of updates between the NCBI databases on BigQuery and those accessible directly through the NCBI website.

Thank you in advance for your time and assistance.

NCBI SQL BigQuery SRA • 353 views

ADD COMMENT • link 16 months ago by marie.harmel ▴ 10

Login before adding your answer.