Entering edit mode
8.6 years ago
Shicheng Guo
★
9.5k
It is too easy to make error report in the bedGraphToBigWig process. I want to save the time for the fresh people. The following procedure would be work well for majority situations.
1. bedGraph should be without header before sorting
awk 'NR!=1' input.bedGraph > input.deheader.bedGraph
2. bedGraph should be sorted
sort -k1,1 -k2,2n unsorted.bedGraph > sorted.bedGraph
3. chromosome length should be same with Bam files
fetchChromSizes hg19 > hg19.chrom.sizes
4. Be sure the bedGraph should only have 4 columns
awk '{print $1,$2,$3,$4}' NC-P-2.bedGraph_CpG.sort.bedGraph > NC-P-2.bedGraph_CpG.sort.4.bedGraph
5. Now, you can run the script (bedGraphToBigWig input.sort.bam chrome.size output.bw
)
Summary, You can use the following two step to do bedGraphtobigwig transformation:
(head -n 1 NC-P-25.bedGraph_CpG.bedGraph && tail -n +2 NC-P-25.bedGraph_CpG.bedGraph | sort -k1,1 -k2,2n | awk '{print $1,$2,$3,$4}' OFS="\t" ) > NC-P-25.bedGraph_CpG.bedGraph.sort
bedGraphToBigWig NC-P-25.bedGraph_CpG.bedGraph.sort hg19.chrom.sizes NC-P-2.bw
For large number of bedGraph files:
# bedgraph to bigwig
for I in `ls *bedGraph`
do
(head -n 1 $i && tail -n +2 $i | sort -k1,1 -k2,2n | awk '{print $1,$2,$3,$4}' OFS="\t" ) > $i.sort
bedGraphToBigWig $i.sort ~/oasis/db/hg19/hg19.chrom.sizes $i.bw
done
..
It is fine to have this done, I would like to ask you to also put from where one get
fetchChromSizes
with the link In any case there are numerous ways to do that. I can simply link toHOMER
as it is a suite involving numbers *seq pipelines. Obviously for a newbie it will be important. However as you know there are some softwares which provides the chromsome sizes directly. There is also an error in the postfetchChromSizes hg19 > hg18.chrom.sizes
. It should behg19.chrom.sizes
Also the order is a bit tricky here for the file naming convention. If you can make it simpler for newbies. Appreciate your effort. It can be more enriched.
Another thing is the sort command (I appreciate the sorting you implemented) it is also available in
bedops
as well. People should also know its power and you can put it as well if you want (just an advice)Hi, I'm using only Chr1 as my reference genome. So, FetchChromSizes hg19 shouldn't work for me, right? What could I use?
Note that historically there are small differences in the way that NCBI, EBI and UCSC name the chromosomes. What is "MT" for EBI, is called "chrMT" for NCBI and "chrM" for UCSC. If you used a genome not from UCSC for your analysis, you may have to fix up these small differences. To convert EBI or NCBI chrom names to UCSC chrom names in a wig or bed text file, you can use UCSC's little utility chromToUcsc. Download it with "wget https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/chromToUcsc", make it executable with "chmod a+x chromToUcsc" (it's a python2/3 script) and run it without arguments to get the usage message. Here is an example call: chromToUcsc -g hg19 --get && chromToUcsc -i test.wig -o test.ucsc.wig -a hg19.chromAlias.tsv -g hg19