I will be porting a R library to Python as part of my bachelor's thesis. DSeq2 for differential gene expression seems a good candidate. Any others which make people stick to R?
I will be porting a R library to Python as part of my bachelor's thesis. DSeq2 for differential gene expression seems a good candidate. Any others which make people stick to R?
The biggies are obviously DESeq2, limma and edgeR, but they are massive packages doing some very complex statistics, and also have dependency trees that would need to be considered.
Depending on your background, you might want to look into the rtracklayer/GenomicRanges eco-system. While I personally am not a fan, I know they are very popular, and AFAIK no standard for genomic features has arrisen in python (we have our own classes for dealing with GTF/Bed files). The other thing that R has that python doesn't (AFAIK) is tools for creating genomic graphics in python - Gviz and ggbio equivalents.
As people have already mentioned, another big thing missing from python is all the annotation related packages in biocondutor - the AnnotationDbi packages, and biomaRt - these might also be more managable for a BSc thesis that some of the giant packages mentioned above.
Finally, while pandas and dplyr are quite good at matching each others features, I think there is stuff in packages like tidyr that arn't in pandas.
In fact, the thing that brings me back to R again and again, apart from things like DESeq2 etc, is ggplot. I've tried the python plotting libraries, and I just don't love any of them as much as I love ggplot. I wouldn't recommend trying to port that though. There are at least 2 ports out there already that havn't really suceeded. Its unlikely that any port ever will - RStudio has like, a whole team of coders employed to maintain ggplot - a one off port is never going to compete.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Be sure that whatever you do has been discussed with an experienced supervisor who understands the amount of time you have available and what amount of time such a project would take. For something like DESeq2 it is not enough to simply translate the R code to python. There are also extensive package dependencies that would need a python equivalent, e.g. the SummarizedExperiment container format that by default is used and the dependencies for fold change shrinkage and parallelization. This does not sound like something that can easily be done as part of a BSc thesis.
Slightly off-topic, but I would work a bit on language here. IMO, it's the bazillion unique packages, the package management, the syntax, the focus on FP, and the convenient way of writing packages that makes people use R. And the backwards compatibility ;)
Regarding porting: how about searching on Bioconductor, if you have to do something biological?
Why not port something from
python
toR
?One thing I think might be of value porting over to
python
might be a subset of these data set packages (e.g., gene annotations). Is something likebiomaRt
available inpython
already?unless you are stats you might not want to port a stats package.