Question: Integrating NGS Data for Machine Learning
gravatar for email.egail
20 months ago by
email.egail0 wrote:

I want to be able to use a machine learning algorithm to be able to predict if a particular gene is expressed based on its binding with multiple histones/proteins (likely based on ChIP-seq data).

There would be matrix that would be sorted by regions (like a BED file) containing data such as if the region has a called peak (from ChIP-seq data), if the gene is expressed (RNA-seq data) and any other NGS data that could be integrated.

However, I am having some issues:

I’m having some trouble integrating the RNA-seq and ChIP-seq data. I’m trying to use the intersect command from bedtools but I am not getting any results.

bedtools intersect -a ref.bed -b fileA.bed fileB.bed > output.bed

Is there another/better way to see the overlap?

Ideally, I would like to be able to use multiple cell types to be able to generalize this data. However, this would require creating a third dimension to my data and all of the tools I am familiar with only take two-dimensional data. How best would I incorporate this extra dimension in my dataset?

ADD COMMENTlink modified 20 months ago by timpaines0 • written 20 months ago by email.egail0

Data with more than two dimensions are generally called tensors in the machine learning and data mining communities. There are multiple ways you could go forward depending on your data. You could try tensor regression, support tensor regression or use kernels on tensors to fall back on standard kernel methods or use tensor factorization to project your data into a latent feature space where you could use standard 2d methods. If you're into the current deep learning fashion, you could also use a neural network to extract features that you can use with a more standard machine learning method.

ADD REPLYlink written 20 months ago by Jean-Karim Heriche23k

Out of interest, Jean-Karim, if you are working in this area, which programs / resources are you using?

ADD REPLYlink written 20 months ago by Kevin Blighe66k

I assume the area is tensors not deep learning. For this, I am using R with package rTensor as base for my own functions (e.g. tensor ridge regression). There's also the nnTensor package for non-negative factorizations.

ADD REPLYlink written 20 months ago by Jean-Karim Heriche23k

Just about the bedtools, try -b fileA.bed,fileB.bed

ADD REPLYlink modified 20 months ago • written 20 months ago by geek_y11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1495 users visited in the last hour