Question

Testing For Significance Of Observed Synteny

1

Entering edit mode

12.4 years ago

Eric Normandeau 11k

Hi,

We have built a strongly supported map from RAD markers in a fish species and wanted to assess whether our map showed strong synteny with the published Danio rerio genome. Up to now, we used a somewhat crude method, but it gives very strong support for the synteny.

I blasted all the markers (RAD tag sequences of just below 100bp) on the Danio transcriptome (blastx). I then counted how many times each of our linkage groups blasted to each Danio chromosome. With this approach, we get a very non-random pattern where each of our linkage group hits strongly on one to three Danio chromosomes, suggesting very good synteny. I then represent the result as a bubble plot where the size of the bubbles represents the number of blasts that link each linkage group (y axis) to each chromosome (x axis), for visualization purposes.

What I would like to know about now is: is there an accepted method for assessing the strength of the synteny, maybe returning a p-value?

I can have finer grained data from the blasts, but for now I concentrate only on counting the number of times that each of our linkage groups hit each Danio chromosome, eg:

Count  Us  Danio
3      1   1
5      1   2
1      1   3
...
8      28  24
0      28  25

One quick and dirty method I was thinking about is to use a bootstrap process and see how far our distribution lies in the mass or randomly generated distribution. I am not sure what estimator of the distribution I would use for that, but it may not be to hard to think of a good one.

I would love to have your thoughts on existing methods.

Cheers

EDIT: A friend suggested doing a linear regression on the (x, y) coordinates of Genome1 vs. Genome2 position. It seems a logical choice to see if there is synteny, but I wonder against what to test it to prove that it is really significant. I find the p-value of the regression itself is a bit arbitrary to conclude that there is or not such synteny.

As always, your thoughts are appreciated!

• 3.1k views

ADD COMMENT • link updated 12.4 years ago by Larry_Parnell 16k • written 12.4 years ago by Eric Normandeau 11k

Michael Kuhn · Answer 1 · 2011-12-20

I am not sure that you can adequately define a null hypothesis in a test for significance of synteny. Thus, testing for the opposite of synteny - a more randomized gene order when comparing your genome/gene order to D. rerio - makes sense. It seems to me that your idea of bootstrapping and comparing of distributions is sound. You could also run D. rerio against itself - with the appropriate BLAST queries that are similar in size/distribution/etc to your original queries - in order to see perfect or absolute synteny. Then, perhaps, scramble the map assignments of the D. rerio queries and rerun, and again and again to convince yourself that synteny is what you see.

Quantifying synteny is difficult,as I'm sure you know.