Question

Fine Tune Ngs Downstream Analyses Using Genome Mappability ?

8

Entering edit mode

12.1 years ago

toni ★ 2.2k

Hi all,

It's been a long time that the question of using Genome Mappability in my team has been around but no concrete effort to use it in our analyses has been undertaken by us to date.

Genome Mappability of a region is referred to as the ability of a given DNA stretch when produced by a sequencing experiment to map back unambiguously to its location on the reference genome. (see this paper from Derrien & al)

On the UCSC website, you can find some 'Mappability' tracks tagged as either Alignability, Uniqueness or Blacklisted Regions. (with a varying read length notably).

What I would like to know is :

Do you, in practice today, use this notion to fine tune your analyses (SNP / CNV calling, coverage ... ) ?

EDIT : Answering 'NO' with an explanation of the encountered difficulties to use it or the reason why you do not use it is a good answer as well :)

Concretely, the mappability (or uniqueness) is a score between 0 and 1 for each base along the reference genome. I am also wondering to which extent it can be used to define 'larger' (and then more usable in practice) mappable regions. Setting a hard threshold into this score could result in a very chopped genome.

BONUS : If someone possesses a BED file (not WIG) of mappable genome with read length = 100, I would be very interested.

Thanks in advance.

genome next-gen sequencing mapping analysis • 3.8k views

ADD COMMENT • link updated 12.1 years ago by Pascal ★ 1.5k • written 12.1 years ago by toni ★ 2.2k

0

Entering edit mode

I would really like to know about - "Do you, in practice today, use this notion to fine tune your analyses (SNP / CNV calling, coverage ... ) ?" Can anyone please give answer about this?

ADD REPLY • link 12.1 years ago by Vikas Bansal ★ 2.4k

score 2 · Answer 1 · 2012-03-15

2

Entering edit mode

12.1 years ago

Pascal ★ 1.5k

Very interesting question. Regarding the bonus question, why don't you simply download it from this UCSC data download page (file wgEncodeCrgMapabilityAlign100mer.bw.gz maybe?). Use bigWigToBedGraph to convert it to bedGraph format.

ADD COMMENT • link 12.1 years ago by Pascal ★ 1.5k

0

Entering edit mode

FYI, you can download bigWigToBedGraph from http://hgdownload.cse.ucsc.edu/admin/exe/

ADD REPLY • link 12.1 years ago by Pascal ★ 1.5k

0

Entering edit mode

Thank you Pascal, I am going to give this a try.

ADD REPLY • link 12.1 years ago by toni ★ 2.2k

0

Entering edit mode

Dear Pascal, I am a bit confused because for instance the chr1 starts like this 'chr1 0 14 0.00277778'and there are 10000 N's at the beginning of chr1 sequence. So mappability must equal 0 at these positions ... Any idea why ? An offset or something ? Thanks

ADD REPLY • link 12.1 years ago by toni ★ 2.2k

0

Entering edit mode

I did not notice at first that you pointed me to a hg18 link. With hg19, the bedgraph file looks better.

ADD REPLY • link 12.1 years ago by toni ★ 2.2k

Ram · Answer 2 · 2012-03-15

1

Entering edit mode

12.1 years ago

Ian 6.0k

If there is no mappabilty data for your genome/read length then you may find ProMap of use, it generates mappabilty profiles for use with PICS (an R-based ChIP-seq peak caller by the Gottardo lab).

This BioStars question may also help.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 12.1 years ago by Ian 6.0k