Overall mappability of genome
0
0
Entering edit mode
7 months ago
boczniak767 ▴ 830

Hi,

thanks to extremelly useful page gem-mappability I know how to calculate this measure for each base of my sequence. It looks nice in IGV as transposons have zeroes.

The question is, how could I get one value (percentage) of uniquelly mappable genome? I need this value for peak-caller (Homer, MACS).

gem mappability ngs • 416 views
1
Entering edit mode

Guess you could count bases that have a non-zero mappability value and use that number. This likely does not need to be absolutely precise.

0
Entering edit mode

It sounds good, but I have variable step wig so counting of lines with simple commands like awk ls won't work. Are there any program which returns statistics for wig files?

0
Entering edit mode

Ok, in the end I followed advice on MACS2 page

usually by taking away the simple repeats and Ns from the total genome, one can get an approximate number of effective genome size.

So I've determined overall genome size based on fasta file: grep -v '>' Zm-B73-REFERENCE-NAM-5.0.fa | sed -e 's/$$.$$/&\n/g' | wc -l

Similarly I've determine the number of Ns:

grep -v '>' Zm-B73-REFERENCE-NAM-5.0.fa | sed -e 's/$$.$$/&\n/g' | grep 'N' | wc -l

Finally, I've counted the overall length of repeats as I have appropriate gff file. This step is somewhat arbitrary as there are many kinds of repeats. But in the end it won't affect the output, as I assume from Istvan Alberts' post somewhere at biostars that the first digit in the mappable genome size is most significant.