Question

Computer specs for analysis of transcritpional datasets from litterature

0

Entering edit mode

7.1 years ago

giroudpaul ▴ 70

Hi Biostar Community,

I'm starting my PhD in a small company and my subject will require to mine data from literature transcriptional analysis (microarray/RNA-seq). I might also have to analyses a couple RNA-seq experiment myself.

I already analyzed microarray data, and I know you don't need a beast to do it, but I have read that for RNAseq, especially on human data, you need considerable computing power if you need to perform raw reads alignment

I don't have access to a computer cluster, and it is uncertain that I will.

I seek for advices about what solution are available for me ? Would a laptop be sufficient to perform re-analysis of published data ? Are published data mostly raw datas for RNA-seq ? Do I need to look at Desktop solution ? Or is it not possible without an external computer cluster ?

Thanks for your help !

hardware • 1.5k views

ADD COMMENT • link updated 7.1 years ago by Joe 21k • written 7.1 years ago by giroudpaul ▴ 70

score 2 · Answer 1 · 2017-03-30

Most public datasets are just fastq files, so your life will be easier if you can get access to either a cluster or at least a single larger server (e.g., 32 cores and a 100 gigs of RAM).

If you're doing a PhD then you should have some affiliation with a university or research institute (after all, a company can't award a PhD) and they should be able to help you find compute resources locally.

score 1 · Answer 2 · 2017-03-30

1

Entering edit mode

7.1 years ago

WouterDeCoster 47k

to mine data from literature transcriptional analysis (microarray/RNA-seq)
I might also have to analyses a couple RNA-seq experiment myself.

Those are two very different requirements. For the first you don't need a lot, normal laptop should be fine. For the second it also depends on how many samples (and how many reads) you want to process. You can also have a look at Galaxy to work on.

If you need to analyze RNA-seq, would you exactly repeat the pipeline from the original paper or can you do something else? There are "more recent" approaches which are pretty fast and don't need a beast, have a look at Kallisto/Sailfish/Salmon for pseudoalignment and read counting. Those might save you a headache.

ADD COMMENT • link 7.1 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you both for your answer !

I do have an affiliation with a research institute linked with an hospital, but the access to a bioinformatic server from outside the institute still have to be discussed. You confirm my fear that most public accessible data are raw data.

I used galaxy before with Chip-seq data, but I had access to an institutional galaxy plateform. Now I would only have access to the open access, meaning that have a 250 GB of data limitation.

Another limitation might be network connectivity : Network is pretty bad here, may be upgraded, but for now, exchanging GBs of data is out of question...

What you said WouterDeCoster about original Pipeline is interesting. I don't see any reason why I should follow the paper pipeline if there is new/better available approach.

What about Bioconductor ? Is RNAseq analysis with R possible ? Does it make sense in term of computing power ?

ADD REPLY • link 7.1 years ago by giroudpaul ▴ 70

0

Entering edit mode

I don't see any reason why I should follow the paper pipeline if there is new/better available approach.

Alright, then you can try the lighter methods. Those are definitely good, but it good be that for your assignment you needed to exactly replicate the analysis, so that's why I asked.

What about Bioconductor ? Is RNAseq analysis with R possible ? Does it make sense in term of computing power ?

Typically you first need to do the (pseudo)alignment, followed by read counting (which can be R) followed by differential expression analysis (which is in R). So yes, you can do a lot in R/Bioconductor.

ADD REPLY • link 7.1 years ago by WouterDeCoster 47k

score 0 · Answer 3 · 2017-03-30

0

Entering edit mode

7.1 years ago

Joe 21k

I'm guessing based on this:

but I have read that for RNAseq, especially on human data

Your new role will require you to analyse human data?

If that isn't the case and you're based in the UK, there are facilities like MRC CLIMB which provides free virtual machines for microbial bioinformatics (it's based in my lab at Uni) http://www.climb.ac.uk/

Similarly, if you have absolutely no access to any physical hardware via the uni or otherwise, you can consider Amazon S3/AWS cloud computing solutions. If you aren't going to need the hardware indefinitely, some on-demand VMs might be a good compromise - though you'd have to do the maths to figure out whether its more cost effective.

ADD COMMENT • link 7.1 years ago by Joe 21k

0

Entering edit mode

Human data only, and not based in UK, but thanks for the offer :)

Yeah I read about the cloud computing solution, but as for the possible university cluster, I first need to learn how to use this kind of solution, as I've only worked locally for now.

ADD REPLY • link 7.1 years ago by giroudpaul ▴ 70

0

Entering edit mode

They don't require too much configuration usually, and then you connect via SSH and it's pretty much as if you were working locally then :)

ADD REPLY • link 7.1 years ago by Joe 21k