Question

How To Determine If Genomic Region Is Closer To A Centromere/Telomere

9

Entering edit mode

10.9 years ago

PoGibas 5.1k

Lets say these are the binding sites of MotifX:

chr2    70258563    70258573
chr2    70277815    70277825
chr2    113996917    113996927

I want to see if they are closer to a centromeric or telomeric region.
How can I calculate genomic region preference to be closer to a centromere/telomere?

My solution is this:
Here we can get Easiest Way To Obtain Chromosome Length? and centrome position
(curl -s "ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gap.txt.gz" | gunzip -c | awk 'BEGIN {OFS="\t"}; ($2=="chr2" && $8=="centromere") {print $2,$3,$4}').

length: chr2 243199373
position: chr2 92326171    95326171

From this we can make chromosome map:

Telomere region A = [1 ; 92326171/2]
Centromere region = [92326171/2 ; 95326171+((243199373-95326171)/2)]
Telomere region B = [95326171+(243199373-95326171)/2 ; 243199373]

  1------//------46163085------//------169262772------//------243199373
     TelemoreA             Centromere              TelemoreB
      region                region                  region

And a BED file:

cat chromosome_map.bed
chr2    1            46163085     Telomere_region
chr2    46163086     169262771    Centromere_region
chr2    169262772    243199373    Telomere_region

intersectBed -a MotifX_bs.bed -b chromosome_map.bed -wo
chr2    70258563    70258573    chr2    46163086    169262771    Centromere_region    10
chr2    70277815    70277825    chr2    46163086    169262771    Centromere_region    10
chr2    113996917    113996927    chr2    46163086    169262771    Centromere_region    10

grep -c Centromere intersectBed_output
3
grep -c Telomere intersectBed_output
0

From this we can make Fisher's test:

>table = matrix(c(3,0,3-3,3-0), ncol=2, byrow=T)
>table
       [,1] [,2]
 [1,]    3    0
 [2,]    0    3
>fisher.test(table, alternative="greater")

Fisher's Exact Test for Count Data

   data:  table 
   p-value = 0.05
   alternative hypothesis: true odds ratio is greater than 1

But this seems way to complicated and probably not so accurate (it's possible to divide chromosome into bins and assign values).

• 4.0k views

ADD COMMENT • link updated 10.9 years ago by Ian 6.0k • written 10.9 years ago by PoGibas 5.1k

score 3 · Answer 1 · 2013-06-17

If I were you, I would do the following

Take your empirical data, use something like closestBed to find which one is closer.
Generate something like 10,000 regions genome-wide and annotate each as well
Compare the distribution of 2. with the score you get for 1.

The step 2. will be a bit more tricky. You will need to come up with a decent random chunk selection criteria that will not just reflect some bias that you have in your site selection for 1. More practically, imagine your sites in 1. are heavily biased towards having a certain motif. 2. should reflect that as well.

Good luck, have fun !

score 0 · Answer 2 · 2013-06-17

0

Entering edit mode

10.9 years ago

Ian 6.0k

This sounds like something the Genome Hyperbrowser might be able to answer. I do not know for certain, but it cannot hurt looking.

ADD COMMENT • link 10.9 years ago by Ian 6.0k