Question: How To Determine If Genomic Region Is Closer To A Centromere/Telomere
gravatar for PoGibas
6.5 years ago by
PoGibas4.8k wrote:

Lets say these are the binding sites of MotifX:

chr2    70258563    70258573
chr2    70277815    70277825
chr2    113996917    113996927

I want to see if they are closer to a centromeric or telomeric region.
How can I calculate genomic region preference to be closer to a centromere/telomere?

My solution is this:
Here we can get Easiest Way To Obtain Chromosome Length? and centrome position
(curl -s "" | gunzip -c | awk 'BEGIN {OFS="\t"}; ($2=="chr2" && $8=="centromere") {print $2,$3,$4}').

length: chr2 243199373
position: chr2 92326171    95326171

From this we can make chromosome map:

Telomere region A = [1 ; 92326171/2]
Centromere region = [92326171/2 ; 95326171+((243199373-95326171)/2)]
Telomere region B = [95326171+(243199373-95326171)/2 ; 243199373]

     TelemoreA             Centromere              TelemoreB
      region                region                  region

And a BED file:

cat chromosome_map.bed
chr2    1            46163085     Telomere_region
chr2    46163086     169262771    Centromere_region
chr2    169262772    243199373    Telomere_region

intersectBed -a MotifX_bs.bed -b chromosome_map.bed -wo
chr2    70258563    70258573    chr2    46163086    169262771    Centromere_region    10
chr2    70277815    70277825    chr2    46163086    169262771    Centromere_region    10
chr2    113996917    113996927    chr2    46163086    169262771    Centromere_region    10

grep -c Centromere intersectBed_output
grep -c Telomere intersectBed_output

From this we can make Fisher's test:

>table = matrix(c(3,0,3-3,3-0), ncol=2, byrow=T)
       [,1] [,2]
 [1,]    3    0
 [2,]    0    3
>fisher.test(table, alternative="greater")

Fisher's Exact Test for Count Data

   data:  table 
   p-value = 0.05
   alternative hypothesis: true odds ratio is greater than 1

But this seems way to complicated and probably not so accurate (it's possible to divide chromosome into bins and assign values).

ADD COMMENTlink modified 6.5 years ago by Ian5.6k • written 6.5 years ago by PoGibas4.8k
gravatar for Gabriel R.
6.5 years ago by
Gabriel R.2.6k
Danmarks Tekniske Universitet
Gabriel R.2.6k wrote:

If I were you, I would do the following

  1. Take your empirical data, use something like closestBed to find which one is closer.
  2. Generate something like 10,000 regions genome-wide and annotate each as well
  3. Compare the distribution of 2. with the score you get for 1.

The step 2. will be a bit more tricky. You will need to come up with a decent random chunk selection criteria that will not just reflect some bias that you have in your site selection for 1. More practically, imagine your sites in 1. are heavily biased towards having a certain motif. 2. should reflect that as well.

Good luck, have fun !

ADD COMMENTlink written 6.5 years ago by Gabriel R.2.6k
gravatar for Ian
6.5 years ago by
University of Manchester, UK
Ian5.6k wrote:

This sounds like something the Genome Hyperbrowser might be able to answer. I do not know for certain, but it cannot hurt looking.

ADD COMMENTlink written 6.5 years ago by Ian5.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2242 users visited in the last hour