Would Anyone Share The Metabric Breast Cancer Data With Me Please?
3
0
Entering edit mode
12.1 years ago
koukougogo ▴ 60

Hi folks, I'm working on breast cancer subtype identification and I'm dying for the METABRIC breast cancer data. I've contacted the deposit site, the DATA whatever committee, the corresponding authors, no one has replied to me yet. I have to admit that although it's a Nature paper and the data has been used for DREAM7 challenge, it's almost impossible to get the data... I'm just wondering if anyone would be willing to share the data with me, or any clues where I can download the data? I need the gene expression data and the clinical information. Big thanks!!

metabric • 11k views
ADD COMMENT
3
Entering edit mode
12.1 years ago

These data involve human subjects and potentially identifying information, so it would be highly unethical to share the protected parts of them. As far as I know, you will need to go through the process of requesting access through the Data Access Committee for protected data--there is no way around that.

The data and access details are available through here (at least):

ADD COMMENT
0
Entering edit mode

Thanks for answering! I've contact EBI but nobody replied to me :( I guess I have to wait

ADD REPLY
0
Entering edit mode

See EGA: Data Access for an overview of the procedure to follow to request access to the EGA data. Please note that this requires that you contact the appropriate Data Access Committees (DACs) directly for the data sets you want to get access to. Once the DACs have approved access the European Genome-phenome Archive (EGA) folks will contact you with details of how to access the data.

ADD REPLY
0
Entering edit mode

Yes - I agree. For example, DREAM7 participants had to sign a release form saying that the data wouldn't be shared with anybody else or used for any purpose beyond the competition (without additional permission for the data owners)

ADD REPLY
1
Entering edit mode
7.0 years ago

You can download it from CBIOPORTAL. LINK http://www.cbioportal.org/datasets

ADD COMMENT
0
Entering edit mode

Only the clinical data is publically available for METABRIC, which is available from cBIOPORTAL as previously mentioned. The genomic or expression data is only available through the Data Access Committee request.

ADD REPLY
1
Entering edit mode
1 day ago

Answering because this old thread was recently 'boosted' due to a spam herbal remedy post...

Hey,

The METABRIC data is a bit of a pain to track down sometimes due to the access controls on the raw sequencing files, but the processed gene expression and clinical data are now publicly available from a few reliable sources without needing to chase authors or committees. I've worked with this dataset quite a bit during my PhD on breast cancer genetics, so here's what I'd recommend:

  1. cBioPortal: This is probably the most straightforward option for most people. The METABRIC study (from the 2012 Nature and 2016 Nat Commun papers) is hosted there with gene expression (mRNA, z-scores and raw), copy number, mutations, and full clinical annotations for ~2,500 samples. You can browse, visualize, and download the data directly—no login required for the bulk downloads. Just go to the study page, select the "Download" tab, and grab the clinical data file (e.g., data_clinical.txt) and expression data (e.g., data_mRNA_median_Zscores.txt or similar). Here's the direct link: https://www.cbioportal.org/study/summary?id=brca_metabric

  2. Synapse: If you need the data in a more structured format (including expression, CNV, SNP genotypes, and clinical traits), sign up for a free Synapse account (from Sage Bionetworks). Once logged in, you can access the full METABRIC bundle for independent research. It might require a quick terms-of-use agreement, but no formal DAC approval like EGA. Link: https://www.synapse.org/Synapse:syn1688369

  3. Via R/Bioconductor (recommended if you're analyzing in R): If you're doing subtype identification or any downstream analysis, skip the manual downloads and pull it directly into R using the MetaGxBreast package. This gives you an ExpressionSet object with the gene expression matrix and clinical/pheno data already integrated. Here's a quick code snippet to get you started:

    # Install if needed
    if (!require("BiocManager", quietly = TRUE))
        install.packages("BiocManager")
    BiocManager::install("MetaGxBreast")
    
    # Load the package and dataset
    library(MetaGxBreast)
    esets <- loadBreastEsets(loadString = "METABRIC")  # Loads METABRIC specifically
    metabric_eset <- esets$METABRIC  # Your ExpressionSet object
    
    # Quick peek
    head(exprs(metabric_eset))  # Gene expression matrix
    head(pData(metabric_eset))  # Clinical/pheno data
    

    The vignette has more details: https://bioconductor.org/packages/release/data/experiment/vignettes/MetaGxBreast/inst/doc/MetaGxBreast.html. This package pulls from curated sources, so it's clean and ready for analysis.

If you're after the raw sequencing data (e.g., for re-processing), that's still behind the EGA wall (EGAS00000000083) and requires approval from the METABRIC DAC—sounds like what you ran into. But for expression and clinical, the above should cover you. If you hit any snags or need help with analysis (e.g., PAM50 subtyping in R), feel free to follow up.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 4175 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6