How can I make a mappability track for hg38 assembly. I have explored the GEM library option to generate the mappability track but the software does not have binary for generating mappability track(http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030377). It will be very helpful if I get some pointers in this regard..
As of September 2017 the gem-mappability binary is missing from the GEM build. The note on the GEM wiki dated 2012 states "mappability tools are still in the process of being migrated to the new mapping engine. The mappability will come back online shortly." (wiki for the GEM library). There is also no ENCODE version of the mappability track for hg38. Is my best bet really doing a liftOver on wgEncodeCrgMapabilityAlign100mer.bigWig from hg19 to hg38?
many thanks,
I understand what the rn6.softmask.all.fa file is.
I dont undestand what rn6.softmask.all file is (the line above)
Are you saying they are the same thing?
Well, that line is just the declaration of variable named "pref" which is used in code below wherever the file name prefix is required. Simply put, its used to create file name prefix.
For example, below line will assign variable "idxpref" with value "rn6.softmask.all_index"
I am running the following code:gem-indexer -i Homo_sapiens-GRCh38.p13_contigs.fa -o Homo_sapiens-GRCh38.p13_contigs -threads 8
And I am getting this error:
GEM::Error (archive_builder_text.c:180,archive_builder_text_process_character)
MultiFASTA parsing (genome/Homo_sapiens-GRCh38.p13/Homo_sapiens-GRCh38.p13_contigs.fa:46604068). Invalid character found ('>')
UCSC's wigToBigWig from here (I chose binary for Linux 64 bit, you can choose different OS)
#!/bin/bash -
#===============================================================================
#
# FILE: mappa_cacl.sh
#
# USAGE: ./mappa_cacl.sh
#
# DESCRIPTION:
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Vang Le (vql AT rn.dk)
# ORGANIZATION:
# CREATED: 26/02/16 01:49
# REVISION: ---
# Reference: http://wiki.bits.vib.be/index.php/Create_a_mappability_track
#===============================================================================
set -o nounset # Treat unset variables as an error
pref="rn6.softmask.all" # should be changed
reference="rn6.softmask.all.fa" # should be changed
idxpref="${pref}_index"
thr=20; # use 20 cores, change it to the number you want to use
outmappa="rn6.mappa.tsv" # should be changed
#make index for the genome
gem-indexer -T ${thr} -c dna -i ${reference} -o ${idxpref}
# The following calculates index and creates mappability tracks with various kmer lengths.
# it may take long time to finish.
# choose the right kmer length for you.
for kmer in 45 50 75; do
# compute mappability data
gem-mappability -T ${thr} -I ${idxpref}.gem -l ${kmer} -o ${pref}_${kmer}
mpc=$(gawk '{c+=gsub(s,s)}END{print c}' s='!' ${pref}_${kmer}.mappability)
echo ${pref}_${kmer}"\t"$mpc >> $outmappa
# convert results to wig and bigwig
gem-2-wig -I ${idxpref}.gem -i ${pref}_${kmer}.mappability -o ${pref}_${kmer}
wigToBigWig ${pref}_${kmer}.wig ${pref}.sizes ${pref}_${kmer}.bw
done
Excellent script and thanks! I managed to run until the third step but wigToBigWig returned an error: strange var=val pair line 1 of CanFam3_1_mappability_gemlib_50.wig.
That's great! I've one question, Is there any difference in generating mappability for paired-end reads? Can I use the GEM mappalibity generated from single-end reads to paired-end libraries?
What do you mean GEM doesn't have a library for this? That's what
gem-mappability
is for. I just downloaded the binary and it's there.My mistake !! I looked it in binary-pre-release-2, which does not have the binary for mappability but release 3 has it. Thanks !!
As of September 2017 the gem-mappability binary is missing from the GEM build. The note on the GEM wiki dated 2012 states "mappability tools are still in the process of being migrated to the new mapping engine. The mappability will come back online shortly." (wiki for the GEM library). There is also no ENCODE version of the mappability track for hg38. Is my best bet really doing a liftOver on wgEncodeCrgMapabilityAlign100mer.bigWig from hg19 to hg38?
"rn6.softmask.all.fa" is the reference genome file with repetitive regions soft masked.
many thanks, I understand what the rn6.softmask.all.fa file is. I dont undestand what rn6.softmask.all file is (the line above) Are you saying they are the same thing?
Well, that line is just the declaration of variable named "pref" which is used in code below wherever the file name prefix is required. Simply put, its used to create file name prefix.
For example, below line will assign variable "idxpref" with value "rn6.softmask.all_index"
idxpref="${pref}_index"
oh! thanks so much! I assumed i was missing an index file of some sort!!
thanks again..
If you are working with UCSC hg19 there are pre-built mappability files from ENCODE available here.
I am running the following code:
gem-indexer -i Homo_sapiens-GRCh38.p13_contigs.fa -o Homo_sapiens-GRCh38.p13_contigs -threads 8
And I am getting this error:
GEM::Error (archive_builder_text.c:180,archive_builder_text_process_character) MultiFASTA parsing (genome/Homo_sapiens-GRCh38.p13/Homo_sapiens-GRCh38.p13_contigs.fa:46604068). Invalid character found ('>')
Any idea how to solve it?
May be Homo_sapiens-GRCh38.p13_contigs.fa file has the ">" as a part of sequence instead of newline. Check the file once.