Question

Mapability index from UCSC

0

Entering edit mode

8.2 years ago

Floydian_slip ▴ 170

Hi, I am trying to calculate a single mappability value for each of my regions of interest (about 300-600bp) that are spread all over the genome. Is that possible from the output below from UCSC mapability track?

I used the following: clade: Mammal genome: Human assembly: hg19 group: Mapping and Sequencing Tracks track: Mapability table: wgEncodeCrgMapabilityAlign100mer output: data points

I input my region in a .bed format. These are some regions: chr11 34470687 34471096

chr11 34472485 34472886

chr11 34473574 34473999

chr11 34474581 34475018

chr11 34475290 34475705

chr11 34477504 34477935

chr11 34492865 34493948

I left the output file name empty to show the results in the browser and these are some of the output lines:

track type=wiggle_0 name="CRG Align 100" description="Alignability of 100mers by GEM from ENCODE/CRG(Guigo)" #bedGraph section chr11:34414404-34711729

chr11 34468326 34478748 1

#bedGraph section chr11:34414404-34711729

chr11 34468326 34478748 1

#bedGraph section chr11:34414404-34711729

chr11 34468326 34478748 1

#bedGraph section chr11:34414404-34711729

chr11 34468326 34478748 1

#bedGraph section chr11:34414404-34711729

chr11 34468326 34478748 1

As you can see, for the first few regions in my .bed file, it returned the same large region of about 10kb with the mapabiliy of 1. This large region contains the first few regions. I was expecting a single value for bins of 100 bp. Instead it gives me a large region that encompasses my first few smaller regions with a mapability of 1. Should I assume that my smaller regions also have mapability of 1.

Moreover, a little further down the output file, it gives

#bedGraph section chr11:43736659-43874663

chr11 43872380 43874643 1

chr11 43874643 43874644 0.5

chr11 43874644 43874646 1

chr11 43874646 43874648 0.125

chr11 43874648 43874649 0.166667

chr11 43874649 43874653 0.2

chr11 43874653 43874654 0.166667

chr11 43874654 43874655 0.125

chr11 43874655 43874656 0.2

chr11 43874656 43874657 0.25

chr11 43874657 43874658 0.333333

chr11 43874658 43874660 0.00564972

chr11 43874660 43874662 0.00909091

chr11 43874662 43874663 0.00446429

As you can see there are many single-nucleotide position that have very low mapability. How is that useful? A single nucleotide is of course going to match all over the genome?

How do I calculate a single mappability value for each of my regions? Is that possible from the output above?

Thanks

mapability UCSC • 2.1k views

ADD COMMENT • link 8.2 years ago by Floydian_slip ▴ 170