Question: Sending files to Galaxy
1
gravatar for statfa
2.8 years ago by
statfa450
statfa450 wrote:

Hi,

I'm very new with Galaxy. I have just started using it. I'm sending fastq files from http://www.ebi.ac.uk/ena/data/ to Galaxy but I'm encountering some questions:

  1. When I send the files to galaxy, the sizes of them, shown on Galaxy, differ substantially from each other. I mean, the fastq files come from 4 people and logically they all must have similar sizes. But for the first person it is only 396 MB and for the second person it is 3.9 GB... Is it possible that the files aren't properly sent to Galaxy? Galaxy's history is green for all files which should definitely mean that the files are uploaded properly in Galaxy... So what could be the problem?

  2. I don't know how to use Galaxy on Cloud. Do you think usegalaxy.org could fulfill my needs of converting fastq files to bam files and then obtaining read counts (RNA-seq study)? I have 16 samples in total which are from 16 total RNA-seq data.

Thanks

galaxy fastq • 1.1k views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by statfa450
1
  1. That could be correct, you'd have to give the accession numbers on ENA for someone to look at what's expected.
ADD REPLYlink written 2.8 years ago by Devon Ryan91k

Here it is a link to the data: http://www.ebi.ac.uk/ena/data/view/SRP063875

ADD REPLYlink written 2.8 years ago by statfa450
1

There's at least a 4x difference in file sizes between the various samples there, so what you observed seems reasonable. For questions 2 and 3, try the Galaxy site.

ADD REPLYlink written 2.8 years ago by Devon Ryan91k

Another thing that I have just noticed is that not only the sizes differ ( from 390 MB to 8.5 GB) but also, When I click on the files in history tab, I see that some files have a "chart bar" icon and some information as "@SRR2454055.1 1/1 TGCTCCTCTCCACAGGGAAACTCCACTCCAGTGCTCAGCTTGCACCCTGGCACAGGCCAGCAGTTGCTGGAAGTCAGACACCTGCAGATGAAGACCACAG + CBBFFFFFHHHGHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIIHHHHHEFFFFCEDEDEDDDDCCDDDDDDDDDDDDB @SRR2454055.2 2/1"

are shown. While other files don't show this.

ADD REPLYlink written 2.8 years ago by statfa450

What's the data type of the ones with a "chart bar" option? The bit you pasted is the sort of thing that should be seen.

ADD REPLYlink written 2.8 years ago by Devon Ryan91k

I just realized that the green message doesn't necessarily mean the files are uploaded completely. I uploaded the files again and the file sizes changed (the file with 390 MB was replaced with a file with 4.2 GB size when I uploaded it again)... So now I'm confused how to know if my files are uploaded completely when I don't know their exact size and green message isn't a proof?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by statfa450
2

I should note that over on the Galaxy biostar site there are a bunch of people reporting issues related to disk space right now. It's quite possible that there's some sort of hardware issue at the moment.

ADD REPLYlink written 2.8 years ago by Devon Ryan91k
1

Using public galaxy is a convenience many enjoy at no cost. While it works most of the times I don't think they claim to provide foolproof service. This sort of thing may be beyond the control of both galaxy and ENA staff.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax69k
1

This is an unrelated comment but we seem to see this on Biostars recently.

Is this one of those cases where your supervisor has told you to do something without taking into account resources/expertise you have access to? While it may be possible to do this all online it will be much simpler to do it locally. Have you tried to talk with the person about the hurdles you are facing and to see if they can help you gain access to local resources?

ADD REPLYlink written 2.8 years ago by genomax69k

Well, this is my thesis. I have to obtain read counts and I've been struggling a lot these days to find the best way. I've been searched the internet a lot but I still feel confused. The problem is Galaxy sucks! I've been trying to upload files from "EBI" but I have to repeat the process more than 10 times for each file and finally it's not uploaded completely. Every time an error occurs. I don't know why this is a popular platform while I've been encountering many issues since I have started using it. I don't know if I'm on the right path.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by statfa450
1

I will repeat my comment above about talking with your supervisor about the difficulties you are facing. I am not sure what part of the world you are from but there has to be some local infrastructure available to do this type of thing.

Over at NCBI SRA there is a way to download pre-aligned data from this study. If you visit this link and click on the "Alignment" tab which is the second from left you can download all samples from this study aligned to GRCh37 (select scope "same study", output into "BAM" and send to file). The alignments are split by chromosomes and it may be a pain to download the 22+ files but at least that can save you a lot of time. You would still be left with doing the counts locally (what OS do you have access to) but it gets you a step closer.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax69k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 601 users visited in the last hour