Question: How To Determine If Genomic Region Is Closer To A Centromere/Telomere
7
6.5 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

Lets say these are the binding sites of MotifX:

``````chr2    70258563    70258573
chr2    70277815    70277825
chr2    113996917    113996927
``````

I want to see if they are closer to a centromeric or telomeric region.
How can I calculate genomic region preference to be closer to a centromere/telomere?

My solution is this:
Here we can get Easiest Way To Obtain Chromosome Length? and centrome position
(```curl -s "ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/gap.txt.gz" | gunzip -c | awk 'BEGIN {OFS="\t"}; (\$2=="chr2" && \$8=="centromere") {print \$2,\$3,\$4}'```).

``````length: chr2 243199373
position: chr2 92326171    95326171
``````

From this we can make chromosome map:

Telomere region A = [1 ; 92326171/2]
Centromere region = [92326171/2 ; 95326171+((243199373-95326171)/2)]
Telomere region B = [95326171+(243199373-95326171)/2 ; 243199373]

``````  1------//------46163085------//------169262772------//------243199373
TelemoreA             Centromere              TelemoreB
region                region                  region
``````

And a BED file:

``````cat chromosome_map.bed
chr2    1            46163085     Telomere_region
chr2    46163086     169262771    Centromere_region
chr2    169262772    243199373    Telomere_region

intersectBed -a MotifX_bs.bed -b chromosome_map.bed -wo
chr2    70258563    70258573    chr2    46163086    169262771    Centromere_region    10
chr2    70277815    70277825    chr2    46163086    169262771    Centromere_region    10
chr2    113996917    113996927    chr2    46163086    169262771    Centromere_region    10

grep -c Centromere intersectBed_output
3
grep -c Telomere intersectBed_output
0
``````

From this we can make Fisher's test:

``````>table = matrix(c(3,0,3-3,3-0), ncol=2, byrow=T)
>table
[,1] [,2]
[1,]    3    0
[2,]    0    3
>fisher.test(table, alternative="greater")

Fisher's Exact Test for Count Data

data:  table
p-value = 0.05
alternative hypothesis: true odds ratio is greater than 1
``````

But this seems way to complicated and probably not so accurate (it's possible to divide chromosome into bins and assign values).

• 2.7k views
modified 6.5 years ago by Ian5.6k • written 6.5 years ago by PoGibas4.8k
3
6.5 years ago by
Gabriel R.2.6k
Danmarks Tekniske Universitet
Gabriel R.2.6k wrote:

If I were you, I would do the following

1. Take your empirical data, use something like closestBed to find which one is closer.
2. Generate something like 10,000 regions genome-wide and annotate each as well
3. Compare the distribution of 2. with the score you get for 1.

The step 2. will be a bit more tricky. You will need to come up with a decent random chunk selection criteria that will not just reflect some bias that you have in your site selection for 1. More practically, imagine your sites in 1. are heavily biased towards having a certain motif. 2. should reflect that as well.

Good luck, have fun !

0
6.5 years ago by
Ian5.6k
University of Manchester, UK
Ian5.6k wrote:

This sounds like something the Genome Hyperbrowser might be able to answer. I do not know for certain, but it cannot hurt looking.