Question

Analyzing genetic context/gene synteny of hundreds of sequences.

1

Entering edit mode

9.4 years ago

pawlowac ▴ 80

Hi everyone,

I'm looking at analysing the genetic context which my gene is found in among hundreds of genomes. I have the sequence for 5kb upstream and downstream of my gene. I have tried mauve, but it doesn't seem to handle this number of sequences at once.

My thought process is as follows;

Identify conserved fragments of DNA (coding or non-coding) within the sequence
Group sequences together that have those same fragments
Use mauve to analyze a smaller number of more similar sequences

I'm not quite sure how to tackle 1 and 2. Using a global-alignment program (MAFFT) doesn't work here since I run into a memory shortage (I have 8 gb). Does anyone have a suggestion?

synteny mauve genetic-context sequence-comparison • 2.4k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.4 years ago by pawlowac ▴ 80

0

Entering edit mode

How about identifying all refseq genomes that have the same gene and retrieving the annotations within ± 5kb in those genomes? This wouldn't be computationally demanding and would probably be relatively easy to achieve with e.g. blast against refseq_genomic and then some entrez direct magic..

ADD REPLY • link 9.4 years ago by 5heikki 11k

0

Entering edit mode

I've used an ebot (efetch) perl script to download all genomes associated with my protein GI numbers. Then, using biopython I've been able to extract annotations for +/-5 kb around my gene of interest. Do you have a suggestion for automatically comparing the sequences?

ADD REPLY • link 9.4 years ago by pawlowac ▴ 80

0

Entering edit mode

What do you hope to achieve from comparing the sequences that you did not find out from comparing the annotations?

ADD REPLY • link 9.4 years ago by 5heikki 11k

0

Entering edit mode

I hope to identify potential sites of recombination, a comparison of sequence identity of surrounding genes and the average mutation rate between the region surrounding target genes compared to the average mutation rate of the target genes.

ADD REPLY • link 9.4 years ago by pawlowac ▴ 80

0

Entering edit mode

You don't say if you are looking at a populational level, close species comparison, or comparisons between a wider range of taxa.

ADD REPLY • link 9.2 years ago by h.mon 35k