Question

Genomic Regions To Exclude Before Shuffling Intervals

4

Entering edit mode

10.4 years ago

PoGibas 5.1k

I want to do permutation test: randomly reposit (shuffle) given genomic intervals and measure intersection between new coordinates and specific genomic element.

Example:

Different sets of genes: protein coding, pseudogenes, ncRNA - intervals that I want to shuffle;
Genomic repeat L1 - coordinates are stable.
For every gene set shuffle intervals, intersect and measure the overlap with L1 (I am using bedtools shuffle - "reposition each feature in the input BED file on a random chromosome at a random position").

Question - Which genomic regions to exclude from the "genome" (bedtools shuffle -g option) before shuffling gene intervals?
I was going to exclude gaps in the assembly.
But what about:

All gene regions.
If I am shuffling pseudogene intervals should I exclude protein coding and ncRNA coordinates?
All non L1 Repeat masker coordinates.
As alu, LTR and DNA transposons aren't L1 so their won't be any intersection with them?

bedtools • 4.0k views

ADD COMMENT • link updated 7.6 years ago by Biostar 20 • written 10.4 years ago by PoGibas 5.1k

0

Entering edit mode

I am a bit confused about what you are trying to do here. You want to pick genomic coordinates at random (do you mean intervals? coordinates are a fixed point, intervals require two points) and see if they overlap with repeats (L1)? In your example it seems like you have several types of genomic elements and you are going to pick some at random and see if they overlap repeats? What do you mean by shuffling? Are you going to be keeping the width of each element the same and shift them around the genome and you want to exclude all functional regions?

ADD REPLY • link 10.4 years ago by Ying W ★ 4.2k

0

Entering edit mode

I edited my question.

ADD REPLY • link 10.4 years ago by PoGibas 5.1k

score 6 · Answer 1 · 2013-11-20

Greetings,

I have done this for our paper:

http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1003470

I excluded gaps only. I suppose it comes down to what you are testing. We wanted to establish a null hypothesis about how close TEs were to non-coding RNAs.

I would suggest trying both directions. Shuffle the genes and test for overlap, then shuffle TE/Lines and test for overlap.

I also found GenometriCorr useful

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002529