I want to know what are some of the common problems facing a Bioinformatician?
4
2
Entering edit mode
8.6 years ago

like something you wish was faster or maybe some tool that doesn't exist at all.I know that is not specific by any means but I'm aiming to solve such problem and use it as my graduation project . Any suggestions welcome

gene next-gen-sequencing alignment • 2.4k views
ADD COMMENT
5
Entering edit mode
8.6 years ago
venu 7.1k

If you did not search previous threads, these might help you. You can find many more.

ADD COMMENT
5
Entering edit mode
8.6 years ago

Interestingly large number of tools stop working if you provide them with input that is substantially different from what Illumina produces.

FastQC for example the workhorse of QC that can churn through billions of reads with ease will fail with "OutOfMemoryError: GC overhead limit exceeded" on just 10K reads produced by the MinION platform. These reads are of course much longer, many are over 60Kb and that seems to break FastQC.

The prevalence of Illumina style sequencing has imposed a view of how tools should be optimized and what they supposed to show and do. Long reads with more errors in them will radically alter how we approach data analysis - most tools we use to day may not work all that well (or at all).

Error correction by self aligning reads to one another and producing consensus among them is a surprisingly convoluted process when attempted today, that is a topic where one can make good headway. It requires a combination of alignment to select candidates then a multiple sequence alignment between these.

ADD COMMENT
1
Entering edit mode

I do not know in which area OP is going to carry on his work, but these points became few of the important ones in my notes, Thank you.

ADD REPLY
3
Entering edit mode
8.6 years ago

Visualization. Visualizing data is necessary to get a feel of what the data looks like and to generate hypotheses. In my opinion there is a need for better ways of visualizing the large and multidimensional datasets in genomics.

Roughly speaking, It seems to me that there are two extremes to display genomics data: 1) Very localized views of genomic regions as in genome browsers like UCSC's, IGV etc. which miss a global perspective (example). Or 2) Very global, highly condensed xy-plots or aggregated profiles (example) which inevitably hide interesting characteristics or worse pick up artifacts. Circos plots kind of try to fill this gap but personally I find them quite uninformative (example). I don't have in mind better ways of visualization, I'm just pointing out that there is a strong need.

ADD COMMENT

Login before adding your answer.

Traffic: 1477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6