Question: Computer specs for analysis of transcritpional datasets from litterature
0
gravatar for giroudpaul
2.2 years ago by
giroudpaul50
European Union
giroudpaul50 wrote:

Hi Biostar Community,

I'm starting my PhD in a small company and my subject will require to mine data from literature transcriptional analysis (microarray/RNA-seq). I might also have to analyses a couple RNA-seq experiment myself.

I already analyzed microarray data, and I know you don't need a beast to do it, but I have read that for RNAseq, especially on human data, you need considerable computing power if you need to perform raw reads alignment

I don't have access to a computer cluster, and it is uncertain that I will.

I seek for advices about what solution are available for me ? Would a laptop be sufficient to perform re-analysis of published data ? Are published data mostly raw datas for RNA-seq ? Do I need to look at Desktop solution ? Or is it not possible without an external computer cluster ?

Thanks for your help !

hardware • 536 views
ADD COMMENTlink modified 2.2 years ago by jrj.healey12k • written 2.2 years ago by giroudpaul50
2
gravatar for Devon Ryan
2.2 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

Most public datasets are just fastq files, so your life will be easier if you can get access to either a cluster or at least a single larger server (e.g., 32 cores and a 100 gigs of RAM).

If you're doing a PhD then you should have some affiliation with a university or research institute (after all, a company can't award a PhD) and they should be able to help you find compute resources locally.

ADD COMMENTlink written 2.2 years ago by Devon Ryan90k
1
gravatar for WouterDeCoster
2.2 years ago by
Belgium
WouterDeCoster39k wrote:
  • to mine data from literature transcriptional analysis (microarray/RNA-seq)
  • I might also have to analyses a couple RNA-seq experiment myself.

Those are two very different requirements. For the first you don't need a lot, normal laptop should be fine. For the second it also depends on how many samples (and how many reads) you want to process. You can also have a look at Galaxy to work on.

If you need to analyze RNA-seq, would you exactly repeat the pipeline from the original paper or can you do something else? There are "more recent" approaches which are pretty fast and don't need a beast, have a look at Kallisto/Sailfish/Salmon for pseudoalignment and read counting. Those might save you a headache.

ADD COMMENTlink written 2.2 years ago by WouterDeCoster39k

Thank you both for your answer !

I do have an affiliation with a research institute linked with an hospital, but the access to a bioinformatic server from outside the institute still have to be discussed. You confirm my fear that most public accessible data are raw data.

I used galaxy before with Chip-seq data, but I had access to an institutional galaxy plateform. Now I would only have access to the open access, meaning that have a 250 GB of data limitation.

Another limitation might be network connectivity : Network is pretty bad here, may be upgraded, but for now, exchanging GBs of data is out of question...

What you said WouterDeCoster about original Pipeline is interesting. I don't see any reason why I should follow the paper pipeline if there is new/better available approach.

What about Bioconductor ? Is RNAseq analysis with R possible ? Does it make sense in term of computing power ?

ADD REPLYlink written 2.2 years ago by giroudpaul50

I don't see any reason why I should follow the paper pipeline if there is new/better available approach.

Alright, then you can try the lighter methods. Those are definitely good, but it good be that for your assignment you needed to exactly replicate the analysis, so that's why I asked.

What about Bioconductor ? Is RNAseq analysis with R possible ? Does it make sense in term of computing power ?

Typically you first need to do the (pseudo)alignment, followed by read counting (which can be R) followed by differential expression analysis (which is in R). So yes, you can do a lot in R/Bioconductor.

ADD REPLYlink written 2.2 years ago by WouterDeCoster39k
0
gravatar for jrj.healey
2.2 years ago by
jrj.healey12k
United Kingdom
jrj.healey12k wrote:

I'm guessing based on this:

but I have read that for RNAseq, especially on human data

Your new role will require you to analyse human data?

If that isn't the case and you're based in the UK, there are facilities like MRC CLIMB which provides free virtual machines for microbial bioinformatics (it's based in my lab at Uni) http://www.climb.ac.uk/

Similarly, if you have absolutely no access to any physical hardware via the uni or otherwise, you can consider Amazon S3/AWS cloud computing solutions. If you aren't going to need the hardware indefinitely, some on-demand VMs might be a good compromise - though you'd have to do the maths to figure out whether its more cost effective.

ADD COMMENTlink written 2.2 years ago by jrj.healey12k

Human data only, and not based in UK, but thanks for the offer :)

Yeah I read about the cloud computing solution, but as for the possible university cluster, I first need to learn how to use this kind of solution, as I've only worked locally for now.

ADD REPLYlink written 2.2 years ago by giroudpaul50

They don't require too much configuration usually, and then you connect via SSH and it's pretty much as if you were working locally then :)

ADD REPLYlink written 2.2 years ago by jrj.healey12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 998 users visited in the last hour