Where can I get ?or how can I make a mappability track for hg38 assembly
2
0
Entering edit mode
8.5 years ago

How can I make a mappability track for hg38 assembly. I have explored the GEM library option to generate the mappability track but the software does not have binary for generating mappability track(http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030377). It will be very helpful if I get some pointers in this regard..

sequencing alignment Assembly sequence • 12k views
ADD COMMENT
0
Entering edit mode

What do you mean GEM doesn't have a library for this? That's what gem-mappability is for. I just downloaded the binary and it's there.

ADD REPLY
1
Entering edit mode

My mistake !! I looked it in binary-pre-release-2, which does not have the binary for mappability but release 3 has it. Thanks !!

ADD REPLY
1
Entering edit mode

As of September 2017 the gem-mappability binary is missing from the GEM build. The note on the GEM wiki dated 2012 states "mappability tools are still in the process of being migrated to the new mapping engine. The mappability will come back online shortly." (wiki for the GEM library). There is also no ENCODE version of the mappability track for hg38. Is my best bet really doing a liftOver on wgEncodeCrgMapabilityAlign100mer.bigWig from hg19 to hg38?

ADD REPLY
0
Entering edit mode

"rn6.softmask.all.fa" is the reference genome file with repetitive regions soft masked.

ADD REPLY
0
Entering edit mode

many thanks, I understand what the rn6.softmask.all.fa file is. I dont undestand what rn6.softmask.all file is (the line above) Are you saying they are the same thing?

ADD REPLY
0
Entering edit mode

Well, that line is just the declaration of variable named "pref" which is used in code below wherever the file name prefix is required. Simply put, its used to create file name prefix.

For example, below line will assign variable "idxpref" with value "rn6.softmask.all_index"

idxpref="${pref}_index"

ADD REPLY
0
Entering edit mode

oh! thanks so much! I assumed i was missing an index file of some sort!!

thanks again..

ADD REPLY
0
Entering edit mode

If you are working with UCSC hg19 there are pre-built mappability files from ENCODE available here.

ADD REPLY
0
Entering edit mode

I am running the following code:gem-indexer -i Homo_sapiens-GRCh38.p13_contigs.fa -o Homo_sapiens-GRCh38.p13_contigs -threads 8

And I am getting this error:

GEM::Error (archive_builder_text.c:180,archive_builder_text_process_character) MultiFASTA parsing (genome/Homo_sapiens-GRCh38.p13/Homo_sapiens-GRCh38.p13_contigs.fa:46604068). Invalid character found ('>')

Any idea how to solve it?

ADD REPLY
0
Entering edit mode

May be Homo_sapiens-GRCh38.p13_contigs.fa file has the ">" as a part of sequence instead of newline. Check the file once.

ADD REPLY
11
Entering edit mode
8.5 years ago
biocyberman ▴ 860

Lucky you @manojmumar_bhosale I worked on similar problem recently and therefore have the bash script you can use. Required tools:

  1. GEM libary from here
  2. UCSC's wigToBigWig from here (I chose binary for Linux 64 bit, you can choose different OS)

    #!/bin/bash - 
    #===============================================================================
    #
    #          FILE: mappa_cacl.sh
    # 
    #         USAGE: ./mappa_cacl.sh 
    # 
    #   DESCRIPTION: 
    # 
    #       OPTIONS: ---
    #  REQUIREMENTS: ---
    #          BUGS: ---
    #         NOTES: ---
    #        AUTHOR: Vang Le (vql AT rn.dk) 
    #  ORGANIZATION: 
    #       CREATED: 26/02/16 01:49
    #      REVISION:  ---
    #      Reference: http://wiki.bits.vib.be/index.php/Create_a_mappability_track
    #===============================================================================
    
    set -o nounset                              # Treat unset variables as an error
    
    pref="rn6.softmask.all" # should be changed
    reference="rn6.softmask.all.fa" # should be changed
    idxpref="${pref}_index"
    thr=20; # use 20 cores, change it to the number you want to use
    outmappa="rn6.mappa.tsv" # should be changed
    #make index for the genome
    gem-indexer -T ${thr} -c dna -i ${reference} -o ${idxpref}
    
    # The following calculates index and creates mappability tracks with various kmer lengths.
    # it may take long time to finish.
    # choose the right kmer length for you.
    for kmer in 45 50 75; do
    
      # compute mappability data
       gem-mappability -T ${thr} -I ${idxpref}.gem -l ${kmer} -o ${pref}_${kmer}
      mpc=$(gawk '{c+=gsub(s,s)}END{print c}' s='!' ${pref}_${kmer}.mappability)
      echo ${pref}_${kmer}"\t"$mpc >> $outmappa
      # convert results to wig and bigwig
      gem-2-wig -I ${idxpref}.gem -i ${pref}_${kmer}.mappability -o ${pref}_${kmer}
      wigToBigWig ${pref}_${kmer}.wig ${pref}.sizes ${pref}_${kmer}.bw
    
    done
    
ADD COMMENT
0
Entering edit mode

Thank you very much !! You saved me lot of efforts.

ADD REPLY
0
Entering edit mode

This is a great script. Just a small detail; the second link redirects to bigWigToWig (instead of wigToBigWig).

ADD REPLY
0
Entering edit mode

May i ask what exactly is this file?

pref="rn6.softmask.all" # should be changed

is it some sort of index? Did you generate it yourself?

thanks..

ADD REPLY
0
Entering edit mode

It's just the prefix used for filenames. As you can see it was used in the following line and later as ${pref}.

ADD REPLY
0
Entering edit mode

Excellent script and thanks! I managed to run until the third step but wigToBigWig returned an error: strange var=val pair line 1 of CanFam3_1_mappability_gemlib_50.wig.

I am running command:

wigToBigWig CanFam3_1_mappability_gemlib_50.wig CanFam3_1.fa.sizes CanFam3_1_mappability_gemlib_50.bw
  • CanFam3_1_mappability_gemlib_50.wig looks like this:
variableStep    chrom=1 dna     span=100
1       0
variableStep    chrom=1 dna     span=12
101     0.5
variableStep    chrom=1 dna     span=16
113     0.0236624
variableStep    chrom=1 dna     span=2
129     0.0186339
variableStep    chrom=1 dna     span=3
131     0.000155916
ADD REPLY
0
Entering edit mode

Never mind and it is resolved. Somehow wig file had that extra dna\t in it, so I removed that first, and then was able to convert wig to BigWig.

sed -i.bak -E 's/dna[\t]{0,1}//' CanFam3_1_mappability_gemlib_50.wig
wigToBigWig CanFam3_1_mappability_gemlib_50.wig CanFam3_1.fa.sizes CanFam3_1_mappability_gemlib_50.bw
ADD REPLY
2
Entering edit mode
6.2 years ago
shashankbic ▴ 10

Script from @biocyberman is great. For those who are still looking for this, here are steps:

1) Get the reference hg38 genome, in uncompressed fasta format (obviously)

2) Get the 'gemtools': a) SourceForge --> Version 2, pre release or b) GitLab --> Version 1.7

3) Create the index of the genome by using following command:

gemtools index -i <referenceGenome.fasta:PATH> -t <NumberOfThreadsToUse:int>     
e.g.:
gemtools index -i GRCh38.p12.fasta -t 40

4) Once, the index is made, generate the mappability data like:

gem-mappability -I <referenceGenome_index.gem:PATH> -l <K-mer_length:int> -o <output_name.gem:PATH>
e.g.: For 100 mer, the command will be:    
gem-mappability -I GRCh38.p12.gem -l 100 -o GRCh38_mappability_100mer.gem

The complete guide is available at this wiki

ADD COMMENT
0
Entering edit mode

Weirdly, the gemtools index command works for me, but not gem-indexer (as in the wiki). However, when I run gem-mappability, it gives me this error:

Loading index (likely to take long)...Illegal instruction (core dumped)

I don't know what is the wrong here.

ADD REPLY
0
Entering edit mode

Looks like memory issue. Did you check you have enough memory(disk space) to run the command?

ADD REPLY
0
Entering edit mode

Thanks! It is indeed memory issue!!

ADD REPLY
0
Entering edit mode

I used the following argument to get through the memory issue: gem-indexer --max-memory 'force-slow-algorithm'

ADD REPLY
0
Entering edit mode

That's great! I've one question, Is there any difference in generating mappability for paired-end reads? Can I use the GEM mappalibity generated from single-end reads to paired-end libraries?

ADD REPLY

Login before adding your answer.

Traffic: 1769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6