Question: Bam to RPM Bedgraph Conversion
0
gravatar for virlana.shchuka
2.8 years ago by
virlana.shchuka0 wrote:

I want to convert my ChIP-seq bam files to an RPM bedgraph file. I want to use the following command:

bedtools genomecov -bg -split -scale 1000000 -trackline -trackopts “name=bam_file_name description=bam_file” -ibam bamfilename > bedgraphfilename

I want to clarify that, for an rpm file, I should use the value one million following the "-scale"? Or should the scale factor be 6 because it assumes a 10 base?

Thanks in advance!

chip-seq • 2.0k views
ADD COMMENTlink modified 2.4 years ago by h.mon32k • written 2.8 years ago by virlana.shchuka0

I tried the scaling factor command and receive the following output: (standard_in) 1: illegal character: \342 (standard_in) 1: illegal character: \200 (standard_in) 1: illegal character: \234 -bash: 1000000/10743694": No such file or directory

I checked the spacing between all the parts of the command and they are correct. Does anyone know what I am doing wrong?

ADD REPLYlink written 2.8 years ago by virlana.shchuka0
3
gravatar for ATpoint
2.8 years ago by
ATpoint44k
ATpoint44k wrote:

The value of -scale is the one that each value in column 4 of the bg is multiplied with, means you have to calculate the scaling factor externally. Here is a two-liner that can do it:

 ## Scaling factor for single-end data, counting every mapped read (bitwise flag = 0)
 TmpScale=$(bc <<< "scale=6;1000000/$(samtools view -f 0 -c in.bam)")

 ## Now get the actual bedGraph:
 echo '==> RPM scaling factor:' $TmpScale
 bedtools genomecov -bga -ibam in.bam -scale $TmpScale | sort -k1,1 -k2,2n > out.bedGraph

For matters of completeness, there are tools that can output a RPM-normalized bedGraph or bigwig in one go, like deeptools bamCoverage. Still, both genomecov and deeptools are pretty slow. There is a newer tool, mosdepth, which outputs a compressed bedGraph and is lightning fast. The RPM-normalization can be done using something like awk once the bedGraph has been generated.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by ATpoint44k

Thank you for your reply.

1) Do I need to download specific packages to carry out these commands? 2) Can I just lift over these commands, or are all the arrow characters you've used unimportant?

ADD REPLYlink written 2.8 years ago by virlana.shchuka0

Would it be possible for you to explain what exactly your first command is doing?

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by m98280
1

It counts total number of mapped reads in the bam file, and uses it to calculate a per-million scaling factor. Still, I recommend deeptools bamCoverage for creating browser tracks. Is offers many options that are quiet convinient.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by ATpoint44k

Thanks, makes sense. Last question: how would it work for paired-end data?

ADD REPLYlink written 2.4 years ago by m98280
1

Thinking about it again, for single end, I would probably do -f 0 -F 256 (read mapped, and not supplementary alignment), and for paired, -f 1 -F 256 (read paired, but not supplementary). Bitwise flags explained here.

ADD REPLYlink written 2.4 years ago by ATpoint44k

Sorry to bother you again, but do you happen to know if bamCoverage requires for your input Bam file to be sorted? It doesn't look like but I can't seem to find a definitive answer anywhere.. It definitely seems to want a .bai to be present in the same directory as your bam file.

ADD REPLYlink written 2.4 years ago by m98280
1

It requires indexed BAM, and indexing requires sorted BAM, so yes bamCoverage requires sorted BAM. Use samtools sort to sort your bam and then samtools index to index the sorted bam.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by ATpoint44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1607 users visited in the last hour
_