Question: Genomic Regions To Exclude Before Shuffling Intervals
2
gravatar for PoGibas
6.9 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

I want to do permutation test: randomly reposit (shuffle) given genomic intervals and measure intersection between new coordinates and specific genomic element.

Example:

  • Different sets of genes: protein coding, pseudogenes, ncRNA - intervals that I want to shuffle;
    Genomic repeat L1 - coordinates are stable.
  • For every gene set shuffle intervals, intersect and measure the overlap with L1 (I am using bedtools shuffle - "reposition each feature in the input BED file on a random chromosome at a random position").

Question - Which genomic regions to exclude from the "genome" (bedtools shuffle -g option) before shuffling gene intervals?
I was going to exclude gaps in the assembly.
But what about:

  • All gene regions.
    If I am shuffling pseudogene intervals should I exclude protein coding and ncRNA coordinates?
  • All non L1 Repeat masker coordinates.
    As alu, LTR and DNA transposons aren't L1 so their won't be any intersection with them?
bedtools • 3.0k views
ADD COMMENTlink modified 4.1 years ago by Biostar ♦♦ 20 • written 6.9 years ago by PoGibas4.8k

I am a bit confused about what you are trying to do here. You want to pick genomic coordinates at random (do you mean intervals? coordinates are a fixed point, intervals require two points) and see if they overlap with repeats (L1)? In your example it seems like you have several types of genomic elements and you are going to pick some at random and see if they overlap repeats? What do you mean by shuffling? Are you going to be keeping the width of each element the same and shift them around the genome and you want to exclude all functional regions?

ADD REPLYlink modified 6.9 years ago • written 6.9 years ago by Ying W4.0k

I edited my question.

ADD REPLYlink written 6.9 years ago by PoGibas4.8k
3
gravatar for Zev.Kronenberg
6.9 years ago by
United States
Zev.Kronenberg11k wrote:

Greetings,

I have done this for our paper:

http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1003470

I excluded gaps only. I suppose it comes down to what you are testing. We wanted to establish a null hypothesis about how close TEs were to non-coding RNAs.

I would suggest trying both directions. Shuffle the genes and test for overlap, then shuffle TE/Lines and test for overlap.

I also found GenometriCorr useful

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002529

ADD COMMENTlink written 6.9 years ago by Zev.Kronenberg11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1778 users visited in the last hour