I'M Looking For A Bioinformatics Problem Which Could Benefit From A New Or Improved Algorithm, Or From Being Mapped To The Gpu Architecture..
3
5
Entering edit mode
9.5 years ago
tyneuroth ▴ 50

Hello. I am an undergrad majoring in computer science. I am learning GPGPU programming right now, and would like to find a good problem to apply it to. I was wondering if someone can point me in the right direction for finding a relatively simple but important bioinformatics problem, of which I can contribute to the solution by attempting to write an improved, or massively parallel algorithm.

algorithm • 3.6k views
1
Entering edit mode

For short reads alignment or genome assembly.

7
Entering edit mode
9.5 years ago

Some ideas:

• How about the calculation of Linkage Disequilibrium? It's relatively basic math (check these slides for a quick introduction) and easily done in a second for just a couple of SNPs. However, if you want to do LD-analysis genome-wide over many chromosomes with several hundred thousand SNPs you'll have to wait for hours until you see results as you compare all SNPs with all SNPs. This could benefit from a pre-processing step which splits up the SNPs into several sets and assigns each set to a different thread and then calculates LD using several CPUs (or GPUs).
• How about BLAST? As far as I know there's already one GPU-BLAST implementation, but that one can only align protein sequences, so you could go for nucleotide-alignment. The problem here might be that the algorithms involved are a bit more complicated (I guess you could skip the database-creation and just go for the alignment itself?).
• ab initio gene prediction: There are a couple of programs which predict genes based on Hidden Markov Models like SNAP or Augustus, but none of these have parallel (or even GPU) implementations.
1
Entering edit mode

+1 for GPU-blastn. I've been trying to get the current blastx running on a CUDA, with a distinct amount of difficulty. but blastn is what i really want!

0
Entering edit mode

Thanks for the suggestions. I'm not exactly an expert yet in GPGU programming, but I will look into Blastx and see if it is something I can work towards.

3
Entering edit mode
9.5 years ago

Hello,

There are currently no good publicly available tools for identifying footprints in DNase-Seq data. The algorithm is pretty straight forward, look within areas of DNase Hypersensitivity for short segments of DNA protected from cleavage by bound Transcription factors. It can be done in parallel with many nodes inspecting different locations simultaneously. If you are interested let me know, and I can help get you started.

0
Entering edit mode

That sounds like something might be interested in.

0
Entering edit mode

I am also interested in this concept. I see some relevant publications, but I suppose you mean there is not available code yet, right? I am asking because there have passed 10 months since this post.

Thanks

0
Entering edit mode

Since I wrote this, the following paper was published: http://nar.oxfordjournals.org/content/41/21/e201.long It actually provides software for doing DNase-seq analysis. I have not tried the software yet, but I did speak with the author of the paper, and this should address many of the common DNase-seq analysis questions.

1
Entering edit mode
9.5 years ago

scan_for_matches could really use an overhaul (reimplementation!). Let me know if you are going for this one.

http://blog.theseed.org/servers/2010/07/scan-for-matches.html

0
Entering edit mode

Thank you for your suggestion. I'll look into it.