Question

Where can I get ?or how can I make a mappability track for hg38 assembly

1

Entering edit mode

8.7 years ago

manojkumar_bhosale ▴ 80

How can I make a mappability track for hg38 assembly. I have explored the GEM library option to generate the mappability track but the software does not have binary for generating mappability track(http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030377). It will be very helpful if I get some pointers in this regard..

sequencing alignment Assembly sequence • 12k views

ADD COMMENT • link 3.2 years ago by manojkumar_bhosale ▴ 80

0

Entering edit mode

What do you mean GEM doesn't have a library for this? That's what gem-mappability is for. I just downloaded the binary and it's there.

ADD REPLY • link 8.7 years ago by Devon Ryan 104k

1

Entering edit mode

My mistake !! I looked it in binary-pre-release-2, which does not have the binary for mappability but release 3 has it. Thanks !!

ADD REPLY • link 8.7 years ago by manojkumar_bhosale ▴ 80

1

Entering edit mode

As of September 2017 the gem-mappability binary is missing from the GEM build. The note on the GEM wiki dated 2012 states "mappability tools are still in the process of being migrated to the new mapping engine. The mappability will come back online shortly." (wiki for the GEM library). There is also no ENCODE version of the mappability track for hg38. Is my best bet really doing a liftOver on wgEncodeCrgMapabilityAlign100mer.bigWig from hg19 to hg38?

ADD REPLY • link 7.2 years ago by David Quigley 11k

0

Entering edit mode

"rn6.softmask.all.fa" is the reference genome file with repetitive regions soft masked.

ADD REPLY • link 7.0 years ago by manojkumar_bhosale ▴ 80

0

Entering edit mode

many thanks, I understand what the rn6.softmask.all.fa file is. I dont undestand what rn6.softmask.all file is (the line above) Are you saying they are the same thing?

ADD REPLY • link 7.0 years ago by ildem ▴ 60

0

Entering edit mode

Well, that line is just the declaration of variable named "pref" which is used in code below wherever the file name prefix is required. Simply put, its used to create file name prefix.

For example, below line will assign variable "idxpref" with value "rn6.softmask.all_index"

idxpref="${pref}_index"

ADD REPLY • link 7.0 years ago by manojkumar_bhosale ▴ 80

0

Entering edit mode

oh! thanks so much! I assumed i was missing an index file of some sort!!

thanks again..

ADD REPLY • link 7.0 years ago by ildem ▴ 60

0

Entering edit mode

If you are working with UCSC hg19 there are pre-built mappability files from ENCODE available here.

ADD REPLY • link 5.3 years ago by kevin.stachelek ▴ 80

0

Entering edit mode

I am running the following code:gem-indexer -i Homo_sapiens-GRCh38.p13_contigs.fa -o Homo_sapiens-GRCh38.p13_contigs -threads 8

And I am getting this error:

GEM::Error (archive_builder_text.c:180,archive_builder_text_process_character) MultiFASTA parsing (genome/Homo_sapiens-GRCh38.p13/Homo_sapiens-GRCh38.p13_contigs.fa:46604068). Invalid character found ('>')

Any idea how to solve it?

ADD REPLY • link 3.2 years ago by Oleczkaff • 0

0

Entering edit mode

May be Homo_sapiens-GRCh38.p13_contigs.fa file has the ">" as a part of sequence instead of newline. Check the file once.

ADD REPLY • link 3.2 years ago by manojkumar_bhosale ▴ 80

2

Entering edit mode

6.3 years ago

shashankbic ▴ 10

Script from @biocyberman is great. For those who are still looking for this, here are steps:

1) Get the reference hg38 genome, in uncompressed fasta format (obviously)

2) Get the 'gemtools': a) SourceForge --> Version 2, pre release or b) GitLab --> Version 1.7

3) Create the index of the genome by using following command:

gemtools index -i <referenceGenome.fasta:PATH> -t <NumberOfThreadsToUse:int>     
e.g.:
gemtools index -i GRCh38.p12.fasta -t 40

4) Once, the index is made, generate the mappability data like:

gem-mappability -I <referenceGenome_index.gem:PATH> -l <K-mer_length:int> -o <output_name.gem:PATH>
e.g.: For 100 mer, the command will be:    
gem-mappability -I GRCh38.p12.gem -l 100 -o GRCh38_mappability_100mer.gem

The complete guide is available at this wiki

ADD COMMENT • link 6.3 years ago by shashankbic ▴ 10

0

Entering edit mode

Weirdly, the gemtools index command works for me, but not gem-indexer (as in the wiki). However, when I run gem-mappability, it gives me this error:

Loading index (likely to take long)...Illegal instruction (core dumped)

I don't know what is the wrong here.

ADD REPLY • link 6.1 years ago by pyjiang2 ▴ 40

0

Entering edit mode

Looks like memory issue. Did you check you have enough memory(disk space) to run the command?

ADD REPLY • link 6.0 years ago by manojkumar_bhosale ▴ 80

0

Entering edit mode

Thanks! It is indeed memory issue!!

ADD REPLY • link 6.0 years ago by pyjiang2 ▴ 40

0

Entering edit mode

I used the following argument to get through the memory issue: gem-indexer --max-memory 'force-slow-algorithm'

ADD REPLY • link 5.7 years ago by dineika_chandra ▴ 30

0

Entering edit mode

That's great! I've one question, Is there any difference in generating mappability for paired-end reads? Can I use the GEM mappalibity generated from single-end reads to paired-end libraries?

ADD REPLY • link 5.3 years ago by wszheng1010 • 0

score 11 · Accepted Answer · 2016-03-10

11

Entering edit mode

8.7 years ago

biocyberman ▴ 870

Lucky you @manojmumar_bhosale I worked on similar problem recently and therefore have the bash script you can use. Required tools:

GEM libary from here

UCSC's wigToBigWig from here (I chose binary for Linux 64 bit, you can choose different OS)

#!/bin/bash - 
#===============================================================================
#
#          FILE: mappa_cacl.sh
# 
#         USAGE: ./mappa_cacl.sh 
# 
#   DESCRIPTION: 
# 
#       OPTIONS: ---
#  REQUIREMENTS: ---
#          BUGS: ---
#         NOTES: ---
#        AUTHOR: Vang Le (vql AT rn.dk) 
#  ORGANIZATION: 
#       CREATED: 26/02/16 01:49
#      REVISION:  ---
#      Reference: http://wiki.bits.vib.be/index.php/Create_a_mappability_track
#===============================================================================

set -o nounset                              # Treat unset variables as an error

pref="rn6.softmask.all" # should be changed
reference="rn6.softmask.all.fa" # should be changed
idxpref="${pref}_index"
thr=20; # use 20 cores, change it to the number you want to use
outmappa="rn6.mappa.tsv" # should be changed
#make index for the genome
gem-indexer -T ${thr} -c dna -i ${reference} -o ${idxpref}

# The following calculates index and creates mappability tracks with various kmer lengths.
# it may take long time to finish.
# choose the right kmer length for you.
for kmer in 45 50 75; do

  # compute mappability data
   gem-mappability -T ${thr} -I ${idxpref}.gem -l ${kmer} -o ${pref}_${kmer}
  mpc=$(gawk '{c+=gsub(s,s)}END{print c}' s='!' ${pref}_${kmer}.mappability)
  echo ${pref}_${kmer}"\t"$mpc >> $outmappa
  # convert results to wig and bigwig
  gem-2-wig -I ${idxpref}.gem -i ${pref}_${kmer}.mappability -o ${pref}_${kmer}
  wigToBigWig ${pref}_${kmer}.wig ${pref}.sizes ${pref}_${kmer}.bw

done

ADD COMMENT • link 8.7 years ago by biocyberman ▴ 870

0

Entering edit mode

Thank you very much !! You saved me lot of efforts.

ADD REPLY • link 8.7 years ago by manojkumar_bhosale ▴ 80

0

Entering edit mode

This is a great script. Just a small detail; the second link redirects to bigWigToWig (instead of wigToBigWig).

ADD REPLY • link 7.3 years ago by Karol Pal jr. ▴ 20

0

Entering edit mode

May i ask what exactly is this file?

pref="rn6.softmask.all" # should be changed

is it some sort of index? Did you generate it yourself?

thanks..

ADD REPLY • link 7.0 years ago by ildem ▴ 60

0

Entering edit mode

It's just the prefix used for filenames. As you can see it was used in the following line and later as ${pref}.

ADD REPLY • link 6.5 years ago by binknis • 0

0

Entering edit mode

Excellent script and thanks! I managed to run until the third step but wigToBigWig returned an error: strange var=val pair line 1 of CanFam3_1_mappability_gemlib_50.wig.

I am running command:

wigToBigWig CanFam3_1_mappability_gemlib_50.wig CanFam3_1.fa.sizes CanFam3_1_mappability_gemlib_50.bw

CanFam3_1_mappability_gemlib_50.wig looks like this:

variableStep    chrom=1 dna     span=100
1       0
variableStep    chrom=1 dna     span=12
101     0.5
variableStep    chrom=1 dna     span=16
113     0.0236624
variableStep    chrom=1 dna     span=2
129     0.0186339
variableStep    chrom=1 dna     span=3
131     0.000155916

ADD REPLY • link 6.4 years ago by Samir ▴ 210

0

Entering edit mode

Never mind and it is resolved. Somehow wig file had that extra dna\t in it, so I removed that first, and then was able to convert wig to BigWig.

sed -i.bak -E 's/dna[\t]{0,1}//' CanFam3_1_mappability_gemlib_50.wig
wigToBigWig CanFam3_1_mappability_gemlib_50.wig CanFam3_1.fa.sizes CanFam3_1_mappability_gemlib_50.bw

ADD REPLY • link 6.4 years ago by Samir ▴ 210