Forum:Rust bioinformatics projects ideas
4
5
Entering edit mode
13 months ago
bompipi95 ▴ 150

Hi everyone,

I wish to work on a bioinformatics project to pick up Rust and to improve my computer science / programming knowledge in general. I don't mind reinventing the wheel a little bit, since my main goal is to learn the language and demonstrate knowledge via a project. I have experience in R, python and work primarily in RNA/scRNAseq data analysis, being a biologist turned bioinformatician. Could you suggest a useful & accessible project to start with? I am thinking of projects like speeding up an existing tool / developing more efficient bam parsers etc.

Thank you!

projects scRNA-seq RNA-seq Rust • 2.5k views
ADD COMMENT
0
Entering edit mode

Thanks for the reply! Do you suggest working on existing rust tool repositories (e.g. submitting pull requests), rather than developing a tool separately?

ADD REPLY
4
Entering edit mode
13 months ago
ATpoint 81k

I would start reimplementing something with a comparably limited codebase such as seqtk. It's a great parser with several submodules yet not overly complex compared to something like an aligner or the VEP. It's heavily used and fast so you can benchmark against its C implementation.

ADD COMMENT
0
Entering edit mode

Ah yes, great suggestion!

ADD REPLY
3
Entering edit mode
13 months ago

https://github.com/Ensembl/ensembl-vep

One of the most common tools used in all of bioinformatics, written in Perl.

ADD COMMENT
2
Entering edit mode

Could you suggest a useful & ====>accessible<==== project to start with...

ADD REPLY
1
Entering edit mode

parts of this could be ported over piecemeal

ADD REPLY
0
Entering edit mode

Thanks Jeremy Leipzig for the suggestion! Perl is unfortunately above my head at the moment, and working on this project would require me learning 2 (!) languages.. I will probably choose something else to work on! Maybe porting over something written in python/R is best for me to start with.

ADD REPLY
2
Entering edit mode
13 months ago
cmdcolin ★ 3.8k

it's hard to recommend what to write but here are two small CLI projects i made with rust https://github.com/cmdcolin/vcfverifier https://github.com/cmdcolin/secondary_rewriter use htslib or noodles for BAM/CRAM parsing (or parse the output of e.g. samtools view in SAM format as text. secondary_rewriter just parses the text, and vcfverifier actually uses htslib for bgzf file parsing).

i don't know a lot about single cell sequencing but since you do, try to add your expertise! calculate expression or coverage per-cell from a BAM file or something like that perhaps

ADD COMMENT
0
Entering edit mode

Thanks for the thoughtful reply and suggestions! It's very useful & kind of you to share your personal rust projects. This is something I aspire to do!

ADD REPLY
0
Entering edit mode
5 months ago

https://github.com/algolab/pantas

This probably should not have been written in Python (it takes 2 hours to run), but the fact it was gives someone an opportunity to port this over

ADD COMMENT

Login before adding your answer.

Traffic: 1834 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6