I'm looking at analysing the genetic context which my gene is found in among hundreds of genomes. I have the sequence for 5kb upstream and downstream of my gene. I have tried mauve, but it doesn't seem to handle this number of sequences at once.
My thought process is as follows;
- Identify conserved fragments of DNA (coding or non-coding) within the sequence
- Group sequences together that have those same fragments
- Use mauve to analyze a smaller number of more similar sequences
I'm not quite sure how to tackle 1 and 2. Using a global-alignment program (MAFFT) doesn't work here since I run into a memory shortage (I have 8 gb). Does anyone have a suggestion?