I recently downloaded some data from the EGA, including .cram files and the associated metadata. However, I am having some difficulty understanding how to match samples to the files correctly.
As shown in the attached screenshot, a single sample (e.g., sample_accession_id EGAN00001380723) is associated with multiple file_accession_ids (e.g., several EGAFxxxxx entries).
Does this mean that the sequencing data for this sample was uploaded in multiple .cram files, and I should merge these files to reconstruct the full dataset for that sample?
Unfortunately, I have not been able to reach either the first or corresponding author of the original publication, so I’m reaching out to see if you might have encountered a similar situation.
Based on the EGA help page there is a way to download the metadata, which should give you more information.