Forum: R or python, which one do you prefer in analysing scRNAseq datasets?
0
gravatar for wt215
10 months ago by
wt2150
wt2150 wrote:

Hi,

The number of cells from scRNAseq experiment can be very large. Especially for recent 10X datasets, a dataset contains around 1.3 million cells, which is very large.

R seems to have trouble even in loading the raw gene-cell expression count table. I am not very familiar with Bioinformatics in Python, can python handle such large dataset easily?

Given such large datasets, many normalization methods which utilized Bayesian methods or optimization algorithm could be time consuming. Which language do you think that could win, R or python?

Thanks in advance.

python rna-seq forum R • 954 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by wt2150
4

Software is only as good as the underlying algorithm. If that is flawed then software (using that algorithm) running faster with one particular language does not make that language/package a winner.

Good programmers will work around technical difficulties. Parts of a program can be coded in a different language (if that offers technical advantages) and then called from within a program.

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax63k

Yes I agree. I am a bit worried that the development of hardware cannot keep up with the development of scRNAseq techniques.

The data is getting bigger and bigger, especially for sequencing fastq data and hence the increasing number of cells stored in the count table.

I really hope that there is one day that my laptop can handle both preprocessing fastq files as well as downstream analysis easily.

ADD REPLYlink written 10 months ago by wt2150
3

"my laptop"

who told you that was an acceptable platform?

ADD REPLYlink modified 10 months ago • written 10 months ago by Jeremy Leipzig18k

Large datasets are always going to require access to appropriately sized hardware. Ideally you would be able to have access via your company/institute/university but if that is not an option then cloud based providers do have solutions that will fit, even now. They will be pricey to pay out of pocket.

Your laptop (if it retains that form factor in future) may handle much larger data but we are sadly a ways away from that day.

ADD REPLYlink written 10 months ago by genomax63k
2

Which software do you think that could win, R or python?

Neither of those are software, but programming languages. Both can be completely shit when you don't use them right, and both can solve your issue with loading raw gene-cell expression data if you use them correctly.

A lot of scRNAseq packages are written in R.

ADD REPLYlink written 10 months ago by WouterDeCoster37k

Sorry, my mistake, should be language rather than software. Thanks for pointing it out.

ADD REPLYlink written 10 months ago by wt2150

Your bottle-neck is likely not going to be the choice of language. It's going to be the availability of existing packages to do what you want to do. Python will likely be faster for loading large datasets, but if there aren't already packages for scRNA-seq analysis, are you going to spend the time to write your own? I guess it will come down to what is better time spent: writing something new in the faster language or cobbling together existing things in either languages to accomplish your goal.

ADD REPLYlink written 10 months ago by Damian Kao15k

Languages are tools and if Python and R are my only choices, I pick Rython.

ADD REPLYlink written 10 months ago by Eric Lim1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2190 users visited in the last hour