I have exome data given to me that I am to work with this summer. Exome data consists of, well, exome data for several species that was mapped to one reference genome provided (i.e. zebrafish). The task given to me was to collect CDS of genes of interest (done that, collected from GenBank and Ensembl) and then find those regions of interest in the exome data.

I have no idea how to go about this as this is my first time ever working with exons. I also don't understand the labels but I think they are positions in the genome?

i.e. A label can be:

>7;10419000-188000_1-7:1900_40
------AACTGA----GTACGA....GTACCTA--------


The "...." just means the sequence goes on a while, but the hyphens are included in the sequences to denote gaps.

which I think refers to exon 7 (of a protein that I have to find out), but the coordinates don't make sense to me.

I want to somehow find all the exons in my exome data for species 1 associated with zebrafish gene A, which I can do once I figure out how to just map these exons in the first place.

Any advice would be appreciated, I am only stuck on this step and it has been a week+. I apologize if this has been asked before but I have read other threads that do not seem to help me in my case.

