Where can I get ?or how can I make a mappability track for hg38 assembly
2
0
Entering edit mode
5.6 years ago

How can I make a mappability track for hg38 assembly. I have explored the GEM library option to generate the mappability track but the software does not have binary for generating mappability track(http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030377). It will be very helpful if I get some pointers in this regard..

sequencing alignment Assembly sequence • 7.7k views
0
Entering edit mode

What do you mean GEM doesn't have a library for this? That's what gem-mappability is for. I just downloaded the binary and it's there.

1
Entering edit mode

My mistake !! I looked it in binary-pre-release-2, which does not have the binary for mappability but release 3 has it. Thanks !!

1
Entering edit mode

As of September 2017 the gem-mappability binary is missing from the GEM build. The note on the GEM wiki dated 2012 states "mappability tools are still in the process of being migrated to the new mapping engine. The mappability will come back online shortly." (wiki for the GEM library). There is also no ENCODE version of the mappability track for hg38. Is my best bet really doing a liftOver on wgEncodeCrgMapabilityAlign100mer.bigWig from hg19 to hg38?

0
Entering edit mode

0
Entering edit mode

many thanks, I understand what the rn6.softmask.all.fa file is. I dont undestand what rn6.softmask.all file is (the line above) Are you saying they are the same thing?

0
Entering edit mode

Well, that line is just the declaration of variable named "pref" which is used in code below wherever the file name prefix is required. Simply put, its used to create file name prefix.

For example, below line will assign variable "idxpref" with value "rn6.softmask.all_index"

idxpref="${pref}_index" ADD REPLY 0 Entering edit mode oh! thanks so much! I assumed i was missing an index file of some sort!! thanks again.. ADD REPLY 0 Entering edit mode If you are working with UCSC hg19 there are pre-built mappability files from ENCODE available here. ADD REPLY 0 Entering edit mode I am running the following code:gem-indexer -i Homo_sapiens-GRCh38.p13_contigs.fa -o Homo_sapiens-GRCh38.p13_contigs -threads 8 And I am getting this error: GEM::Error (archive_builder_text.c:180,archive_builder_text_process_character) MultiFASTA parsing (genome/Homo_sapiens-GRCh38.p13/Homo_sapiens-GRCh38.p13_contigs.fa:46604068). Invalid character found ('>') Any idea how to solve it? ADD REPLY 0 Entering edit mode May be Homo_sapiens-GRCh38.p13_contigs.fa file has the ">" as a part of sequence instead of newline. Check the file once. ADD REPLY 10 Entering edit mode 5.6 years ago biocyberman ▴ 830 Lucky you @manojmumar_bhosale I worked on similar problem recently and therefore have the bash script you can use. Required tools: 1. GEM libary from here 2. UCSC's wigToBigWig from here (I chose binary for Linux 64 bit, you can choose different OS) #!/bin/bash - #=============================================================================== # # FILE: mappa_cacl.sh # # USAGE: ./mappa_cacl.sh # # DESCRIPTION: # # OPTIONS: --- # REQUIREMENTS: --- # BUGS: --- # NOTES: --- # AUTHOR: Vang Le (vql AT rn.dk) # ORGANIZATION: # CREATED: 26/02/16 01:49 # REVISION: --- # Reference: http://wiki.bits.vib.be/index.php/Create_a_mappability_track #=============================================================================== set -o nounset # Treat unset variables as an error pref="rn6.softmask.all" # should be changed reference="rn6.softmask.all.fa" # should be changed idxpref="${pref}_index"
thr=20; # use 20 cores, change it to the number you want to use
outmappa="rn6.mappa.tsv" # should be changed
#make index for the genome
gem-indexer -T ${thr} -c dna -i${reference} -o ${idxpref} # The following calculates index and creates mappability tracks with various kmer lengths. # it may take long time to finish. # choose the right kmer length for you. for kmer in 45 50 75; do # compute mappability data gem-mappability -T${thr} -I ${idxpref}.gem -l${kmer} -o ${pref}_${kmer}
mpc=$(gawk '{c+=gsub(s,s)}END{print c}' s='!'${pref}_${kmer}.mappability) echo${pref}_${kmer}"\t"$mpc >> $outmappa # convert results to wig and bigwig gem-2-wig -I${idxpref}.gem -i ${pref}_${kmer}.mappability -o ${pref}_${kmer}
wigToBigWig ${pref}_${kmer}.wig ${pref}.sizes${pref}_${kmer}.bw done  ADD COMMENT 0 Entering edit mode Thank you very much !! You saved me lot of efforts. ADD REPLY 0 Entering edit mode This is a great script. Just a small detail; the second link redirects to bigWigToWig (instead of wigToBigWig). ADD REPLY 0 Entering edit mode May i ask what exactly is this file? pref="rn6.softmask.all" # should be changed is it some sort of index? Did you generate it yourself? thanks.. ADD REPLY 0 Entering edit mode It's just the prefix used for filenames. As you can see it was used in the following line and later as${pref}.

0
Entering edit mode

Excellent script and thanks! I managed to run until the third step but wigToBigWig returned an error: strange var=val pair line 1 of CanFam3_1_mappability_gemlib_50.wig.

I am running command:

wigToBigWig CanFam3_1_mappability_gemlib_50.wig CanFam3_1.fa.sizes CanFam3_1_mappability_gemlib_50.bw

• CanFam3_1_mappability_gemlib_50.wig looks like this:
variableStep    chrom=1 dna     span=100
1       0
variableStep    chrom=1 dna     span=12
101     0.5
variableStep    chrom=1 dna     span=16
113     0.0236624
variableStep    chrom=1 dna     span=2
129     0.0186339
variableStep    chrom=1 dna     span=3
131     0.000155916

0
Entering edit mode

Never mind and it is resolved. Somehow wig file had that extra dna\t in it, so I removed that first, and then was able to convert wig to BigWig.

sed -i.bak -E 's/dna[\t]{0,1}//' CanFam3_1_mappability_gemlib_50.wig
wigToBigWig CanFam3_1_mappability_gemlib_50.wig CanFam3_1.fa.sizes CanFam3_1_mappability_gemlib_50.bw

2
Entering edit mode
3.3 years ago
shashankbic ▴ 10

Script from @biocyberman is great. For those who are still looking for this, here are steps:

1) Get the reference hg38 genome, in uncompressed fasta format (obviously)

2) Get the 'gemtools': a) SourceForge --> Version 2, pre release or b) GitLab --> Version 1.7

3) Create the index of the genome by using following command:

gemtools index -i <referenceGenome.fasta:PATH> -t <NumberOfThreadsToUse:int>
e.g.:
gemtools index -i GRCh38.p12.fasta -t 40


4) Once, the index is made, generate the mappability data like:

gem-mappability -I <referenceGenome_index.gem:PATH> -l <K-mer_length:int> -o <output_name.gem:PATH>
e.g.: For 100 mer, the command will be:
gem-mappability -I GRCh38.p12.gem -l 100 -o GRCh38_mappability_100mer.gem


The complete guide is available at this wiki

0
Entering edit mode

Weirdly, the gemtools index command works for me, but not gem-indexer (as in the wiki). However, when I run gem-mappability, it gives me this error:

Loading index (likely to take long)...Illegal instruction (core dumped)

I don't know what is the wrong here.

0
Entering edit mode

Looks like memory issue. Did you check you have enough memory(disk space) to run the command?

0
Entering edit mode

Thanks! It is indeed memory issue!!

0
Entering edit mode

I used the following argument to get through the memory issue: gem-indexer --max-memory 'force-slow-algorithm'

0
Entering edit mode

That's great! I've one question, Is there any difference in generating mappability for paired-end reads? Can I use the GEM mappalibity generated from single-end reads to paired-end libraries?