I want to be able to use a machine learning algorithm to be able to predict if a particular gene is expressed based on its binding with multiple histones/proteins (likely based on ChIP-seq data).
There would be matrix that would be sorted by regions (like a BED file) containing data such as if the region has a called peak (from ChIP-seq data), if the gene is expressed (RNA-seq data) and any other NGS data that could be integrated.
However, I am having some issues:
I’m having some trouble integrating the RNA-seq and ChIP-seq data. I’m trying to use the intersect command from bedtools but I am not getting any results.
bedtools intersect -a ref.bed -b fileA.bed fileB.bed > output.bed
Is there another/better way to see the overlap?
Ideally, I would like to be able to use multiple cell types to be able to generalize this data. However, this would require creating a third dimension to my data and all of the tools I am familiar with only take two-dimensional data. How best would I incorporate this extra dimension in my dataset?