Question: Where can I get ?or how can I make a mappability track for hg38 assembly
0
gravatar for manojkumar_bhosale
2.7 years ago by
manojkumar_bhosale60 wrote:

How can I make a mappability track for hg38 assembly. I have explored the GEM library option to generate the mappability track but the software does not have binary for generating mappability track(http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030377). It will be very helpful if I get some pointers in this regard..

ADD COMMENTlink modified 4 months ago by shashankbic10 • written 2.7 years ago by manojkumar_bhosale60

What do you mean GEM doesn't have a library for this? That's what gem-mappability is for. I just downloaded the binary and it's there.

ADD REPLYlink written 2.7 years ago by Devon Ryan86k
1

My mistake !! I looked it in binary-pre-release-2, which does not have the binary for mappability but release 3 has it. Thanks !!

ADD REPLYlink written 2.7 years ago by manojkumar_bhosale60
1

As of September 2017 the gem-mappability binary is missing from the GEM build. The note on the GEM wiki dated 2012 states "mappability tools are still in the process of being migrated to the new mapping engine. The mappability will come back online shortly." (wiki for the GEM library). There is also no ENCODE version of the mappability track for hg38. Is my best bet really doing a liftOver on wgEncodeCrgMapabilityAlign100mer.bigWig from hg19 to hg38?

ADD REPLYlink written 14 months ago by David Quigley11k

"rn6.softmask.all.fa" is the reference genome file with repetitive regions soft masked.

ADD REPLYlink written 11 months ago by manojkumar_bhosale60

many thanks, I understand what the rn6.softmask.all.fa file is. I dont undestand what rn6.softmask.all file is (the line above) Are you saying they are the same thing?

ADD REPLYlink written 11 months ago by ildem20

Well, that line is just the declaration of variable named "pref" which is used in code below wherever the file name prefix is required. Simply put, its used to create file name prefix.

For example, below line will assign variable "idxpref" with value "rn6.softmask.all_index"

idxpref="${pref}_index"

ADD REPLYlink written 11 months ago by manojkumar_bhosale60

oh! thanks so much! I assumed i was missing an index file of some sort!!

thanks again..

ADD REPLYlink written 11 months ago by ildem20
10
gravatar for biocyberman
2.7 years ago by
biocyberman770
Denmark
biocyberman770 wrote:

Lucky you @manojmumar_bhosale I worked on similar problem recently and therefore have the bash script you can use. Required tools:

  1. GEM libary from here
  2. UCSC's wigToBigWig from here (I chose binary for Linux 64 bit, you can choose different OS)

    #!/bin/bash - 
    #===============================================================================
    #
    #          FILE: mappa_cacl.sh
    # 
    #         USAGE: ./mappa_cacl.sh 
    # 
    #   DESCRIPTION: 
    # 
    #       OPTIONS: ---
    #  REQUIREMENTS: ---
    #          BUGS: ---
    #         NOTES: ---
    #        AUTHOR: Vang Le (vql AT rn.dk) 
    #  ORGANIZATION: 
    #       CREATED: 26/02/16 01:49
    #      REVISION:  ---
    #      Reference: http://wiki.bits.vib.be/index.php/Create_a_mappability_track
    #===============================================================================
    
    set -o nounset                              # Treat unset variables as an error
    
    pref="rn6.softmask.all" # should be changed
    reference="rn6.softmask.all.fa" # should be changed
    idxpref="${pref}_index"
    thr=20; # use 20 cores, change it to the number you want to use
    outmappa="rn6.mappa.tsv" # should be changed
    #make index for the genome
    gem-indexer -T ${thr} -c dna -i ${reference} -o ${idxpref}
    
    # The following calculates index and creates mappability tracks with various kmer lengths.
    # it may take long time to finish.
    # choose the right kmer length for you.
    for kmer in 45 50 75; do
    
      # compute mappability data
       gem-mappability -T ${thr} -I ${idxpref}.gem -l ${kmer} -o ${pref}_${kmer}
      mpc=$(gawk '{c+=gsub(s,s)}END{print c}' s='!' ${pref}_${kmer}.mappability)
      echo ${pref}_${kmer}"\t"$mpc >> $outmappa
      # convert results to wig and bigwig
      gem-2-wig -I ${idxpref}.gem -i ${pref}_${kmer}.mappability -o ${pref}_${kmer}
      wigToBigWig ${pref}_${kmer}.wig ${pref}.sizes ${pref}_${kmer}.bw
    
    done
    
ADD COMMENTlink written 2.7 years ago by biocyberman770

Thank you very much !! You saved me lot of efforts.

ADD REPLYlink written 2.7 years ago by manojkumar_bhosale60

This is a great script. Just a small detail; the second link redirects to bigWigToWig (instead of wigToBigWig).

ADD REPLYlink written 16 months ago by Karol Pal jr.20

May i ask what exactly is this file?

pref="rn6.softmask.all" # should be changed

is it some sort of index? Did you generate it yourself?

thanks..

ADD REPLYlink written 11 months ago by ildem20

It's just the prefix used for filenames. As you can see it was used in the following line and later as ${pref}.

ADD REPLYlink written 5 months ago by binknis0

Excellent script and thanks! I managed to run until the third step but wigToBigWig returned an error: strange var=val pair line 1 of CanFam3_1_mappability_gemlib_50.wig.

I am running command:

wigToBigWig CanFam3_1_mappability_gemlib_50.wig CanFam3_1.fa.sizes CanFam3_1_mappability_gemlib_50.bw
  • CanFam3_1_mappability_gemlib_50.wig looks like this:
variableStep    chrom=1 dna     span=100
1       0
variableStep    chrom=1 dna     span=12
101     0.5
variableStep    chrom=1 dna     span=16
113     0.0236624
variableStep    chrom=1 dna     span=2
129     0.0186339
variableStep    chrom=1 dna     span=3
131     0.000155916
ADD REPLYlink modified 5 months ago • written 5 months ago by Samir130

Never mind and it is resolved. Somehow wig file had that extra dna\t in it, so I removed that first, and then was able to convert wig to BigWig.

sed -i.bak -E 's/dna[\t]{0,1}//' CanFam3_1_mappability_gemlib_50.wig
wigToBigWig CanFam3_1_mappability_gemlib_50.wig CanFam3_1.fa.sizes CanFam3_1_mappability_gemlib_50.bw
ADD REPLYlink written 5 months ago by Samir130
2
gravatar for shashankbic
4 months ago by
shashankbic10
shashankbic10 wrote:

Script from @biocyberman is great. For those who are still looking for this, here are steps:

1) Get the reference hg38 genome, in uncompressed fasta format (obviously)

2) Get the 'gemtools': a) SourceForge --> Version 2, pre release or b) GitLab --> Version 1.7

3) Create the index of the genome by using following command:

gemtools index -i <referenceGenome.fasta:PATH> -t <NumberOfThreadsToUse:int>     
e.g.:
gemtools index -i GRCh38.p12.fasta -t 40

4) Once, the index is made, generate the mappability data like:

gem-mappability -I <referenceGenome_index.gem:PATH> -l <K-mer_length:int> -o <output_name.gem:PATH>
e.g.: For 100 mer, the command will be:    
gem-mappability -I GRCh38.p12.gem -l 100 -o GRCh38_mappability_100mer.gem

The complete guide is available at this wiki

ADD COMMENTlink written 4 months ago by shashankbic10

Weirdly, the gemtools index command works for me, but not gem-indexer (as in the wiki). However, when I run gem-mappability, it gives me this error:

Loading index (likely to take long)...Illegal instruction (core dumped)

I don't know what is the wrong here.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by pyjiang230

Looks like memory issue. Did you check you have enough memory(disk space) to run the command?

ADD REPLYlink written 26 days ago by manojkumar_bhosale60

Thanks! It is indeed memory issue!!

ADD REPLYlink written 26 days ago by pyjiang230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1688 users visited in the last hour