Overall mappability of genome
0
0
Entering edit mode
3.0 years ago
boczniak767 ▴ 850

Hi,

thanks to extremelly useful page gem-mappability I know how to calculate this measure for each base of my sequence. It looks nice in IGV as transposons have zeroes.

The question is, how could I get one value (percentage) of uniquelly mappable genome? I need this value for peak-caller (Homer, MACS).

gem mappability ngs • 1.0k views
ADD COMMENT
1
Entering edit mode

Guess you could count bases that have a non-zero mappability value and use that number. This likely does not need to be absolutely precise.

ADD REPLY
0
Entering edit mode

It sounds good, but I have variable step wig so counting of lines with simple commands like awk ls won't work. Are there any program which returns statistics for wig files?

ADD REPLY
0
Entering edit mode

Ok, in the end I followed advice on MACS2 page

usually by taking away the simple repeats and Ns from the total genome, one can get an approximate number of effective genome size.

So I've determined overall genome size based on fasta file: grep -v '>' Zm-B73-REFERENCE-NAM-5.0.fa | sed -e 's/\(.\)/&\n/g' | wc -l

Similarly I've determine the number of Ns:

grep -v '>' Zm-B73-REFERENCE-NAM-5.0.fa | sed -e 's/\(.\)/&\n/g' | grep 'N' | wc -l

Finally, I've counted the overall length of repeats as I have appropriate gff file. This step is somewhat arbitrary as there are many kinds of repeats. But in the end it won't affect the output, as I assume from Istvan Alberts' post somewhere at biostars that the first digit in the mappable genome size is most significant.

ADD REPLY

Login before adding your answer.

Traffic: 2630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6