Question: Quick access/download to CCLE RNA-seq data (FASTQ/BAM files)
0
gravatar for bstrs
6 weeks ago by
bstrs0
European union
bstrs0 wrote:

Hello

are there faster ways to access the RNA-seq data (that is FASTQ files or BAM files) for CCLE https://portals.broadinstitute.org/ccle?

For now the CCLE RNA-seq data (that is FASTQ files or BAM files) can be downloaded from:

but these are really slow (download speed varies between 1-5 samples per day and there are over 900 samples to download). It would take weeks (or even months) in order to download it.

The perfect solution would be to have access to coverage plots (BigWig/BedGraph files) for CCLE and NOT read counts (is there any public database that offers coverage plots for CCLE?). Therefore FASTQ/BAM files would be ok. Another way would be to get CCLE RNA-seq data shipped on hard drive but who offers such service?

Best Rergards,

Dan

rna-seq ccle • 207 views
ADD COMMENTlink modified 6 weeks ago by Kevin Blighe53k • written 6 weeks ago by bstrs0
1
gravatar for Kevin Blighe
6 weeks ago by
Kevin Blighe53k
Kevin Blighe53k wrote:

Back in the 'old days', one could get data distributed on CD or DVD for free, but doubtful that any consortium nowadays is going to go out of their way to ship data to you on a hard-disk

For quicker download, you could try this tutorial: Fast download of FASTQ files from the European Nucleotide Archive (ENA)

The raw read counts are also available for download from the CCLE website.

CCLE data is also on cBioPortal

I am tentatively also developing a R Shiny app for interrogating the CCLE data, but not yet started. Otherwise, I'd have shared with you. Kevin

ADD COMMENTlink written 6 weeks ago by Kevin Blighe53k

Thanks Kevin but:

  • cBioPortal does not provide Fastq/Bam/BigWig files for CCLE

  • I need Fastq/Bam/BigWig files and not raw reads counts (this was specified in my original post),

  • I have tried aspera and indeed is faster but not fast enough

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by bstrs0

cBioPortal does not provide Fastq/Bam/BigWig files for CCLE

Oh, I know, but thought that it could be of other utility.

I have tried aspera and indeed is faster but not fast enough

Can you elaborate? - what rates are you achieving?; and to where are you downloading?

ADD REPLYlink written 6 weeks ago by Kevin Blighe53k

The download speed were specified in the original post already, that is on average 1-5 samples per 24 hours using aspera.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by bstrs0

Kilobase or megabase per second is the standard way to quote that. What is it about your infrastructure that results in a slow download rate? I have achieved download rates of >100MB/s in the past using Aspera. Also, what is the urgency about this project.

ADD REPLYlink written 6 weeks ago by Kevin Blighe53k

Hi Kevin,

so the limitation is the download speed at my work place (in a Nordic country). I get on average downloads with aspera 0.5Mbps (home I get 10 Mbps).

Basically, I am looking for the cell lines that express (or express the most) a given exon of a given gene, where the exon is not annotated in any commonly used gene annotation database (Ensembl, Genecode, etc.) The exon is annotated only in CHESS database http://ccb.jhu.edu/chess/ (and validated in few articles). This is why raw accounts using Ensembl, Genecode, RefSeq are not useful here.

Shortly, a BAM slicing (or BigWig slicing) would do fine where I could get the raw counts for my exon of interest in all cell lines (that is an easy shell script to write and I could provide it if one has the BAM files for CCLE). GDC portal supports BAM slicing for CCLE but only if one has some NIH account (which I do not have). The irony here is the this is possible to do for Gtex because the BigWig files are available for all samples from GTex on recount2. CCLE is not in recount2 yet. Probably recount2 will add CCLE next year.

The urgency is like 2-4 weeks.

If I put more info here I guess that I should start new post as different question!

I will send you a PM with more info if that is ok with you!

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by bstrs0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1167 users visited in the last hour