Question

Forum:R or python, which one do you prefer in analysing scRNAseq datasets?

0

Entering edit mode

6.2 years ago

wt215 • 0

Hi,

The number of cells from scRNAseq experiment can be very large. Especially for recent 10X datasets, a dataset contains around 1.3 million cells, which is very large.

R seems to have trouble even in loading the raw gene-cell expression count table. I am not very familiar with Bioinformatics in Python, can python handle such large dataset easily?

Given such large datasets, many normalization methods which utilized Bayesian methods or optimization algorithm could be time consuming. Which language do you think that could win, R or python?

Thanks in advance.

R RNA-Seq python • 5.3k views

ADD COMMENT • link updated 14 months ago by Ram 44k • written 6.2 years ago by wt215 • 0

4

Entering edit mode

Software is only as good as the underlying algorithm. If that is flawed then software (using that algorithm) running faster with one particular language does not make that language/package a winner.

Good programmers will work around technical difficulties. Parts of a program can be coded in a different language (if that offers technical advantages) and then called from within a program.

ADD REPLY • link 6.2 years ago by GenoMax 144k

0

Entering edit mode

Yes I agree. I am a bit worried that the development of hardware cannot keep up with the development of scRNAseq techniques.

The data is getting bigger and bigger, especially for sequencing fastq data and hence the increasing number of cells stored in the count table.

I really hope that there is one day that my laptop can handle both preprocessing fastq files as well as downstream analysis easily.

ADD REPLY • link 6.2 years ago by wt215 • 0

3

Entering edit mode

"my laptop"

who told you that was an acceptable platform?

ADD REPLY • link 6.2 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Large datasets are always going to require access to appropriately sized hardware. Ideally you would be able to have access via your company/institute/university but if that is not an option then cloud based providers do have solutions that will fit, even now. They will be pricey to pay out of pocket.

Your laptop (if it retains that form factor in future) may handle much larger data but we are sadly a ways away from that day.

ADD REPLY • link 6.2 years ago by GenoMax 144k

2

Entering edit mode

Which software do you think that could win, R or python?

Neither of those are software, but programming languages. Both can be completely shit when you don't use them right, and both can solve your issue with loading raw gene-cell expression data if you use them correctly.

A lot of scRNAseq packages are written in R.

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Sorry, my mistake, should be language rather than software. Thanks for pointing it out.

ADD REPLY • link 6.2 years ago by wt215 • 0

0

Entering edit mode

Your bottle-neck is likely not going to be the choice of language. It's going to be the availability of existing packages to do what you want to do. Python will likely be faster for loading large datasets, but if there aren't already packages for scRNA-seq analysis, are you going to spend the time to write your own? I guess it will come down to what is better time spent: writing something new in the faster language or cobbling together existing things in either languages to accomplish your goal.