I have a list of genes and a coverage track (generated using BEDOPS-based binning script also called as
binReads.sh), modified it to take bed file as input).
What I want do it to divide the each gene into 100 bins and then calculate the mean overlap from my coverage file, so that later all the respective bins can be added for all genes to give a average protein binding/coverage in that bin.
I couldn't understand that how I can make a specific number of bins for the genes though I can regulate the binSize using
binReads.sh. I made a custom R code, for taking my list of genes and the dividing each of them into 100 bins.
Then, I used this list as a reference, (added a fake 4th column, which is also not very intuitive) to find the mean overlap for my coverage file.
1) This method is not very intutitive (because of R code, it takes ~4-6 mins to complete the process for a list of 30.2K genes producing 30.2K*100 lines as a bed file to be used as a reference). It there a way in bedOps suite to generate the bins on the fly.
2) The output of
bedmap --mean genes_bins.bed file.cov > means.tsv is valid numeric for first 42K reference lines out of 3.2M(30.2K*100) and for the rest it produces "NAN". To cross-validate, I took some specific binned genes and it produces a valid output. Is this because of the size of the files (shouldn't be).
Can you suggest some other way to approach this problem.
genes_bins.bed chr1 3214481 3219051 Xkr4-bin1 chr1 3219051 3223621 Xkr4-bin2 chr1 3223621 3228192 Xkr4-bin3 chr1 3228192 3232762 Xkr4-bin4 chr1 3232762 3237332 Xkr4-bin5 chr1 3237332 3241902 Xkr4-bin6 coverage.bed chr1 3001500 3001520 1 1 chr1 3001520 3001540 1 1 chr1 3001540 3001560 1 1 chr1 3001560 3001580 1 1 chr1 3001580 3001600 1 1 chr1 3001600 3001620 1 1 chr1 3001620 3001640 1 1
P.S. It was started as comment under Converting BAM to bedGraph for viewing on UCSC and I thought its better to ask as a seperate question. This probably be answered by Alex Reynolds himself. The final aim to make the composite profiles of your protein of interest from the chip-seq datasets.