We have sequenced a wild rice genome. However, the data is just a lot nucleic acid date for us. We do not know any other information orther than nucleic acid sequence. Now, I want to know a gene family (for example MADS-box family) location position in the genome. How can I locate the gene family on the chromosome? Thank you for help ahead!
A simple Blast (blastx) search of your new genome to the known proteins from rice must be enough to find the regions in you new sequences. A most fine work can be done training a model for HMMER and search across your contigs/sequences.
To locate a gene or genes belonging to a specific gene family will be difficult until you have some kind of markers or anchors to the O. sativa genome. A simple sequence similarity search, such as BLASTN, may identify a gene as a MADS-box transcriptional regulator but won't tell you where it maps in your wild rice genome.
Ideally, you would like to have markers on known linkage groups or some way to tie the O. sativa genome to yours, or vice versa. Synteny (conserved gene order) may be one way to do this, but actual genetic markers mapped to linkage groups will help more.