Publicly available lymphoma datasets
2
0
Entering edit mode
4.6 years ago
Stefan • 0

Dear all,

I am looking for publicly available genomic datasets of (human) lymphoma patients, preferrably DLBCL. Specifically, I would require variant files (e.g. .vcf format) and/or methylation data. Alternatively I could also work with raw .fastq or .bam files. The aim is to check whether specific non-coding regions contain pathogenic variants or are hyper-/hypomethylated. I would appreciate it if anyone could point me in the right direction. Please also mention datasets that are not publicly available, but institutional access could be granted upon application. Thanks for your help!

Stefan

RNA-Seq Methylation Variants • 1.3k views
ADD COMMENT
2
Entering edit mode
4.6 years ago
ATpoint 81k

Raw data from DLBCL samples (like most sequencing data from patients), e.g. the WGS cohorts from Ryan Morin (Blood - 2013 and Nature Comm - 2017 ) or the ~1000 exomes from Sandeep Dave (Cell 2017) are not publically available. You will have to apply for access at e.g. dbGaPor the EGA to get fast/bam access. Still, be aware that this is quite some Tb of data for any of these cohorts so I recommend access to a HPC and experience in handling datasets of this size. Without decently-parallelized pipelines this is going to take a long time to downlaod/analyze. For VCF files I would probably try and contact the authors. I interacted once briefly with Ryan Morin per Email, he was very kind and helpful. There is typically some variant information in the supplement of the papers but this is then an Excel sheet and not VCF. The WGS data from the cell lines they sequenced in that Blood paper from 2013 are publically available at NCBI if you need them.

As you want to check non-coding parts of the genome, it might be easiest to contact the authors and ask for a VCF. I can already tell you that you will find little to no recurrancy, I worked on a similar project and we did not find much in terms of recurrent SNVs in that cohort from 2013. Check that paper from Arthur et al 2017 in Nat Comm with a cohort of > 100 WGS samples, there is also not a imho "hugh" finding towards the non-coding part.

This would be the WGS from around the Blood paper: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000235.v12.p2

Here the most recent WGS: https://ega-archive.org/dacs/EGAC00001000918

and the exomes: https://www.ebi.ac.uk/ega/studies/EGAS00001002606

ADD COMMENT
0
Entering edit mode

Thank you for the detailed answer, I will look into it!

ADD REPLY
1
Entering edit mode
4.6 years ago

TCGA and ICGC are the two that immediately come to mind, though you'll have to apply for access to the raw data from either. However, the variant/methylation files and expression counts are often available.

ADD COMMENT

Login before adding your answer.

Traffic: 1819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6