What's the difference between EGA, ENA, GEO, SRA and ArrayExpress
Entering edit mode
4.7 years ago
Shicheng Guo ★ 8.7k

GEO and SRA is quite easy to understand, they are for array and NGS, respectively. What about others? Is there any relationship between them? Why we need so many different repertories? Any good suggestion to choose which database to upload or submission?


EGA ENA GEO SRA and ArrayExpress • 4.5k views
Entering edit mode

Look at these links. They look helpful.


“The meta-data about your experiment will be stored at ArrayExpress, and the raw data files (e.g. fastq files) are stored at the Sequence Read Archive (SRA) of the European Nucleotide Archive (ENA). ArrayExpress will transfer the raw data files to the ENA for you so you do not need to submit those files separately to the ENA. You can also send us processed data (i.e. processed from the raw reads, e.g. BAM alignment files, differential expression data, expression values linked to genome coordinates, etc). Depending on the file format, it will either be stored at ArrayExpress or the ENA. Given the split of meta-data and data files between ArrayExpress and ENA, once your submission is fully processed, it is a lengthy process to modify/update it. Some changes (e.g. cancelling an ENA record which has been released to the public) will not be possible. Please take a look at our sequencing experiment update/cancellation policy before proceeding.”


“I have seen ArrayExpress experiment accessions with prefixes such as "E-MTAB", "E-GEOD", etc. What do the prefixes mean? The prefixes indicate the source and/or submission route from which the data came from. The common ones are: • MEXP = data submitted via the MIAMExpress submission route (discontinued since July 2014) • TABM = data submitted via the Tab2MAGE submission route (discontinued since January 2012) • MTAB = data submitted via the MAGE-TAB (discontinued since September 2014) or Annotare submission route • GEOD = data imported from NCBI Gene Expression Omnibus ”

http://www.ncbi.nlm.nih.gov/geo/info/faq.html There are many details inside.


“The BioSample Database (http://www.ebi.ac.uk/biosamples) is a new database at EBI that stores information about biological samples used in molecular experiments, such as sequencing, gene expression or proteomics. The goals of the BioSample Database include: (i) recording and linking of sample information consistently within EBI databases such as ENA, ArrayExpress and PRIDE; (ii) minimizing data entry efforts for EBI database submitters by enabling submitting sample descriptions once and referencing them later in data submissions to assay databases and (iii) supporting cross database queries by sample characteristics. Each sample in the database is assigned an accession number. The database includes a growing set of reference samples, such as cell lines, which are repeatedly used in experiments and can be easily referenced from any database by their accession numbers. Accession numbers for the reference samples will be exchanged with a similar database at NCBI. The samples in the database can be queried by their attributes, such as sample types, disease names or sample providers. A simple tab-delimited format facilitates submissions of sample information to the database, initially via email to biosamples@ebi.ac.uk”

Entering edit mode
4.7 years ago
Satyajeet Khare ★ 1.6k

I think SRA, ERA/ENA and DRA are NGS data repositories of NCBI, EBI and DDBJ respectively. Not sure if GEO and ArrayExpress just store metadata of NGS experiments for NCBI and EBI respectively. NCBI, EBI and DDBJ are part of the International Nucleotide Sequence Database Collaboration (INSDC). EGA is a controlled access data at EMBL-EBI. JGA and dbGAP is probably something similar at DDBJ and NCBI respectively.


Login before adding your answer.

Traffic: 1254 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6