Hi,
We have built a strongly supported map from RAD markers in a fish species and wanted to assess whether our map showed strong synteny with the published Danio rerio genome. Up to now, we used a somewhat crude method, but it gives very strong support for the synteny.
I blasted all the markers (RAD tag sequences of just below 100bp) on the Danio transcriptome (blastx). I then counted how many times each of our linkage groups blasted to each Danio chromosome. With this approach, we get a very non-random pattern where each of our linkage group hits strongly on one to three Danio chromosomes, suggesting very good synteny. I then represent the result as a bubble plot where the size of the bubbles represents the number of blasts that link each linkage group (y axis) to each chromosome (x axis), for visualization purposes.
What I would like to know about now is: is there an accepted method for assessing the strength of the synteny, maybe returning a p-value?
I can have finer grained data from the blasts, but for now I concentrate only on counting the number of times that each of our linkage groups hit each Danio chromosome, eg:
Count Us Danio
3 1 1
5 1 2
1 1 3
...
8 28 24
0 28 25
One quick and dirty method I was thinking about is to use a bootstrap process and see how far our distribution lies in the mass or randomly generated distribution. I am not sure what estimator of the distribution I would use for that, but it may not be to hard to think of a good one.
I would love to have your thoughts on existing methods.
Cheers
EDIT: A friend suggested doing a linear regression on the (x, y) coordinates of Genome1 vs. Genome2 position. It seems a logical choice to see if there is synteny, but I wonder against what to test it to prove that it is really significant. I find the p-value of the regression itself is a bit arbitrary to conclude that there is or not such synteny.
As always, your thoughts are appreciated!