How To Determine If Genomic Region Is Closer To A Centromere/Telomere
2
9
Entering edit mode
10.9 years ago
PoGibas 5.1k

Lets say these are the binding sites of MotifX:

chr2    70258563    70258573
chr2    70277815    70277825
chr2    113996917    113996927

I want to see if they are closer to a centromeric or telomeric region.
How can I calculate genomic region preference to be closer to a centromere/telomere?


My solution is this:
Here we can get Easiest Way To Obtain Chromosome Length? and centrome position
(curl -s "ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gap.txt.gz" | gunzip -c | awk 'BEGIN {OFS="\t"}; ($2=="chr2" && $8=="centromere") {print $2,$3,$4}').

length: chr2 243199373
position: chr2 92326171    95326171

From this we can make chromosome map:

Telomere region A = [1 ; 92326171/2]
Centromere region = [92326171/2 ; 95326171+((243199373-95326171)/2)]
Telomere region B = [95326171+(243199373-95326171)/2 ; 243199373]

  1------//------46163085------//------169262772------//------243199373
     TelemoreA             Centromere              TelemoreB
      region                region                  region

And a BED file:

cat chromosome_map.bed
chr2    1            46163085     Telomere_region
chr2    46163086     169262771    Centromere_region
chr2    169262772    243199373    Telomere_region

intersectBed -a MotifX_bs.bed -b chromosome_map.bed -wo
chr2    70258563    70258573    chr2    46163086    169262771    Centromere_region    10
chr2    70277815    70277825    chr2    46163086    169262771    Centromere_region    10
chr2    113996917    113996927    chr2    46163086    169262771    Centromere_region    10

grep -c Centromere intersectBed_output
3
grep -c Telomere intersectBed_output
0

From this we can make Fisher's test:

>table = matrix(c(3,0,3-3,3-0), ncol=2, byrow=T)
>table
       [,1] [,2]
 [1,]    3    0
 [2,]    0    3
>fisher.test(table, alternative="greater")

Fisher's Exact Test for Count Data

   data:  table 
   p-value = 0.05
   alternative hypothesis: true odds ratio is greater than 1

But this seems way to complicated and probably not so accurate (it's possible to divide chromosome into bins and assign values).

• 4.0k views
ADD COMMENT
3
Entering edit mode
10.9 years ago
Gabriel R. ★ 2.9k

If I were you, I would do the following

  1. Take your empirical data, use something like closestBed to find which one is closer.
  2. Generate something like 10,000 regions genome-wide and annotate each as well
  3. Compare the distribution of 2. with the score you get for 1.

The step 2. will be a bit more tricky. You will need to come up with a decent random chunk selection criteria that will not just reflect some bias that you have in your site selection for 1. More practically, imagine your sites in 1. are heavily biased towards having a certain motif. 2. should reflect that as well.

Good luck, have fun !

ADD COMMENT
0
Entering edit mode
10.9 years ago
Ian 6.0k

This sounds like something the Genome Hyperbrowser might be able to answer. I do not know for certain, but it cannot hurt looking.

ADD COMMENT

Login before adding your answer.

Traffic: 2826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6