Entering edit mode
                    4.5 years ago
        boczniak767
        
    
        ▴
    
    880
    Hi,
thanks to extremelly useful page gem-mappability I know how to calculate this measure for each base of my sequence. It looks nice in IGV as transposons have zeroes.
The question is, how could I get one value (percentage) of uniquelly mappable genome?
I need this value for peak-caller (Homer, MACS).
Guess you could count bases that have a
non-zeromappability value and use that number. This likely does not need to be absolutely precise.It sounds good, but I have variable step
wigso counting of lines with simple commands likeawklswon't work. Are there any program which returns statistics forwigfiles?Ok, in the end I followed advice on MACS2 page
So I've determined overall genome size based on
fastafile:grep -v '>' Zm-B73-REFERENCE-NAM-5.0.fa | sed -e 's/\(.\)/&\n/g' | wc -lSimilarly I've determine the number of
Ns:grep -v '>' Zm-B73-REFERENCE-NAM-5.0.fa | sed -e 's/\(.\)/&\n/g' | grep 'N' | wc -lFinally, I've counted the overall length of repeats as I have appropriate
gfffile. This step is somewhat arbitrary as there are many kinds of repeats. But in the end it won't affect the output, as I assume from Istvan Alberts' post somewhere at biostars that the first digit in the mappable genome size is most significant.