Highly used R packages with no Python equivalent
1
0
Entering edit mode
2.7 years ago
t • 0

I will be porting a R library to Python as part of my bachelor's thesis. DSeq2 for differential gene expression seems a good candidate. Any others which make people stick to R?

package python r software • 3.0k views
ADD COMMENT
2
Entering edit mode

Be sure that whatever you do has been discussed with an experienced supervisor who understands the amount of time you have available and what amount of time such a project would take. For something like DESeq2 it is not enough to simply translate the R code to python. There are also extensive package dependencies that would need a python equivalent, e.g. the SummarizedExperiment container format that by default is used and the dependencies for fold change shrinkage and parallelization. This does not sound like something that can easily be done as part of a BSc thesis.

ADD REPLY
1
Entering edit mode

Slightly off-topic, but I would work a bit on language here. IMO, it's the bazillion unique packages, the package management, the syntax, the focus on FP, and the convenient way of writing packages that makes people use R. And the backwards compatibility ;)

Regarding porting: how about searching on Bioconductor, if you have to do something biological?

ADD REPLY
1
Entering edit mode

Why not port something from python to R?

One thing I think might be of value porting over to python might be a subset of these data set packages (e.g., gene annotations). Is something like biomaRt available in python already?

ADD REPLY
0
Entering edit mode

unless you are stats you might not want to port a stats package.

ADD REPLY
5
Entering edit mode
2.7 years ago

The biggies are obviously DESeq2, limma and edgeR, but they are massive packages doing some very complex statistics, and also have dependency trees that would need to be considered.

Depending on your background, you might want to look into the rtracklayer/GenomicRanges eco-system. While I personally am not a fan, I know they are very popular, and AFAIK no standard for genomic features has arrisen in python (we have our own classes for dealing with GTF/Bed files). The other thing that R has that python doesn't (AFAIK) is tools for creating genomic graphics in python - Gviz and ggbio equivalents.

As people have already mentioned, another big thing missing from python is all the annotation related packages in biocondutor - the AnnotationDbi packages, and biomaRt - these might also be more managable for a BSc thesis that some of the giant packages mentioned above.

Finally, while pandas and dplyr are quite good at matching each others features, I think there is stuff in packages like tidyr that arn't in pandas.

In fact, the thing that brings me back to R again and again, apart from things like DESeq2 etc, is ggplot. I've tried the python plotting libraries, and I just don't love any of them as much as I love ggplot. I wouldn't recommend trying to port that though. There are at least 2 ports out there already that havn't really suceeded. Its unlikely that any port ever will - RStudio has like, a whole team of coders employed to maintain ggplot - a one off port is never going to compete.

ADD COMMENT
2
Entering edit mode

ggplot has been implemented in Python though, take a look at plotnine

ADD REPLY
1
Entering edit mode

Thats what I said - " There are at least 2 ports out there already that haven't really suceeded"

ADD REPLY
0
Entering edit mode

I thought PyRanges was supposed to be the pythonic GenomicRanges equivalent. No?

ADD REPLY

Login before adding your answer.

Traffic: 2635 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6