Question

I want to know what are some of the common problems facing a Bioinformatician?

2

Entering edit mode

8.6 years ago

mohammedrakayby ▴ 20

like something you wish was faster or maybe some tool that doesn't exist at all.I know that is not specific by any means but I'm aiming to solve such problem and use it as my graduation project . Any suggestions welcome

gene next-gen-sequencing alignment • 2.4k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by mohammedrakayby ▴ 20

score 5 · Answer 1 · 2015-09-11

5

Entering edit mode

8.6 years ago

venu 7.1k

If you did not search previous threads, these might help you. You can find many more.

Advice For Newcomers To The Bioinformatics Field
What Are The Most Common Stupid Mistakes In Bioinformatics?
and suggestions under 'Similar posts'

ADD COMMENT • link 8.6 years ago by venu 7.1k

Ram · Answer 2 · 2015-09-11

Interestingly large number of tools stop working if you provide them with input that is substantially different from what Illumina produces.

FastQC for example the workhorse of QC that can churn through billions of reads with ease will fail with "OutOfMemoryError: GC overhead limit exceeded" on just 10K reads produced by the MinION platform. These reads are of course much longer, many are over 60Kb and that seems to break FastQC.

The prevalence of Illumina style sequencing has imposed a view of how tools should be optimized and what they supposed to show and do. Long reads with more errors in them will radically alter how we approach data analysis - most tools we use to day may not work all that well (or at all).

Error correction by self aligning reads to one another and producing consensus among them is a surprisingly convoluted process when attempted today, that is a topic where one can make good headway. It requires a combination of alignment to select candidates then a multiple sequence alignment between these.

score 3 · Answer 3 · 2015-09-12

Visualization. Visualizing data is necessary to get a feel of what the data looks like and to generate hypotheses. In my opinion there is a need for better ways of visualizing the large and multidimensional datasets in genomics.

Roughly speaking, It seems to me that there are two extremes to display genomics data: 1) Very localized views of genomic regions as in genome browsers like UCSC's, IGV etc. which miss a global perspective (example). Or 2) Very global, highly condensed xy-plots or aggregated profiles (example) which inevitably hide interesting characteristics or worse pick up artifacts. Circos plots kind of try to fill this gap but personally I find them quite uninformative (example). I don't have in mind better ways of visualization, I'm just pointing out that there is a strong need.