Question: Non-Random Clusters Of Markers In Genomic Data
3
gravatar for didymos
9.0 years ago by
didymos210
didymos210 wrote:

I have count data describing how many markers are connected with each chromosome position:

  • [0,0,0,1,0,0,0,2,0,0,0,1,1,....]

However, I have 3 or even 4 orders of magnitude less number of markers than available positions - so I have a lot of zeros.

  • My question is how to find clusters of markers with non-random distribution, e.g. too dense comparing to random positioning?

I have calculated distribution of pair distances between markers and compare it with simulated distances from random distribution, and they are different.
I assume that markers are localize both in random and non-random fashion but I am only interested in non-random clusters.

  • Actually I am even looking into similarity of my problem to other bioinformatic approaches in seq analysis (SNP, HMM in CpG island discovery,... ) for some ideas...
R sequence hmm random genomics • 1.6k views
ADD COMMENTlink written 9.0 years ago by didymos210
1
gravatar for brentp
9.0 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

This is an interesting problem. I don't have a great solution, but here's what I've tried in the past. Hopefully others have a more rigorous approach...

The distribution of "stuff" in the genome is already clustered so finding other stuff that's clustered in a different fashion is not trivial (or easy, depending on how you look at it).

You could do a moving average of the count data and look for peaks. Then it's a matter of determining a good window size. You could also use bins (overlapping or otherwise) and find those with a high sum. You could then compare that to randomly-generated + binned data.

For more realism, you'll want the randomly generated data to have the same auto-correlation that you expect to see in the genome--whatever that might be. I suppose you could report significance with respect to each level of auto-correlation that you use in generating your random data.

ADD COMMENTlink written 9.0 years ago by brentp23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 944 users visited in the last hour