The value of -scale is the one that each value in column 4 of the bg is multiplied with, means you have to calculate the scaling factor externally. Here is a two-liner that can do it:
## Scaling factor for single-end data, counting every mapped read (bitwise flag = 0)
TmpScale=$(bc <<< "scale=6;1000000/$(samtools view -f 0 -c in.bam)")
## Now get the actual bedGraph:
echo '==> RPM scaling factor:' $TmpScale
bedtools genomecov -bga -ibam in.bam -scale $TmpScale | sort -k1,1 -k2,2n > out.bedGraph
For matters of completeness, there are tools that can output a RPM-normalized bedGraph or bigwig in one go, like deeptools bamCoverage
. Still, both genomecov and deeptools are pretty slow. There is a newer tool, mosdepth, which outputs a compressed bedGraph and is lightning fast. The RPM-normalization can be done using something like awk
once the bedGraph has been generated.
•
link
modified 2.8 years ago
•
written
2.8 years ago by
ATpoint ♦ 44k
I tried the scaling factor command and receive the following output: (standard_in) 1: illegal character: \342 (standard_in) 1: illegal character: \200 (standard_in) 1: illegal character: \234 -bash: 1000000/10743694": No such file or directory
I checked the spacing between all the parts of the command and they are correct. Does anyone know what I am doing wrong?