I'm very new to bioinformatics, but am keen to learn (I need to)! I'm hoping someone can at least point me in the right direction as learning resources I have come across are either very basic or way too technical.
I want to identify conserved noncoding elements within crustacean genomes for use in a phylogenomic study. Here's what I was thinking: pairwise alignment of two genomes, filter out conserved regions, remove duplicates, BLAST results against other crustacean genomes as a kind of validation (maybe other arthropods too). So, my questions are:
Am I on the right track?
How to do an alignment? I have done many before but only very short regions, never a whole genome. Is this even possible or do I need to break it down? The assemblies I have found appear to be in draft form so if I do cut them into manageable chunks how to do I know a particular set of contigs is the same in both species?
How do I know which of the final set of conserved elements are non-coding if there is no reference to use?
Any help hugely appreciated!