Question

Alignment & Conserved Element

0

Entering edit mode

9.4 years ago

bailliecharles • 0

Hi All,

I'm very new to bioinformatics, but am keen to learn (I need to)! I'm hoping someone can at least point me in the right direction as learning resources I have come across are either very basic or way too technical.

I want to identify conserved noncoding elements within crustacean genomes for use in a phylogenomic study. Here's what I was thinking: pairwise alignment of two genomes, filter out conserved regions, remove duplicates, BLAST results against other crustacean genomes as a kind of validation (maybe other arthropods too). So, my questions are:

Am I on the right track?
How to do an alignment? I have done many before but only very short regions, never a whole genome. Is this even possible or do I need to break it down? The assemblies I have found appear to be in draft form so if I do cut them into manageable chunks how to do I know a particular set of contigs is the same in both species?
How do I know which of the final set of conserved elements are non-coding if there is no reference to use?

Any help hugely appreciated!

C

conserved-element genome alignment • 2.1k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.4 years ago by bailliecharles • 0

Ram · Answer 1 · 2016-02-23

I don't think this kind of noncoding conserved elements are long enough to BLAST them successfully against a third genome. The best way may be to have a multiple genome alignment, but this can be very complicate to build. Alternatively, you could build different pairwise genome alignments. This is complicate too, but less. I would use LAST for the pairwise alignments, but you should post-process the results to identify the best hits from the millions of alignments. I did this once and took me a lot of time to write and tune the scripts. It's a hard problem to start with! I hope someone else devise an easier way of doing this. Are these genomes in UCSC? If they are, the UCSC may have the pairwise alignments already calculated. It's much easier to work with mammalian than with crustacean genomes!