Question: Integrating NGS Data for Machine Learning
gravatar for email.egail
5 months ago by
email.egail0 wrote:

I want to be able to use a machine learning algorithm to be able to predict if a particular gene is expressed based on its binding with multiple histones/proteins (likely based on ChIP-seq data).

There would be matrix that would be sorted by regions (like a BED file) containing data such as if the region has a called peak (from ChIP-seq data), if the gene is expressed (RNA-seq data) and any other NGS data that could be integrated.

However, I am having some issues:

I’m having some trouble integrating the RNA-seq and ChIP-seq data. I’m trying to use the intersect command from bedtools but I am not getting any results.

bedtools intersect -a ref.bed -b fileA.bed fileB.bed > output.bed

Is there another/better way to see the overlap?

Ideally, I would like to be able to use multiple cell types to be able to generalize this data. However, this would require creating a third dimension to my data and all of the tools I am familiar with only take two-dimensional data. How best would I incorporate this extra dimension in my dataset?

ADD COMMENTlink modified 5 months ago by timpaines0 • written 5 months ago by email.egail0

Data with more than two dimensions are generally called tensors in the machine learning and data mining communities. There are multiple ways you could go forward depending on your data. You could try tensor regression, support tensor regression or use kernels on tensors to fall back on standard kernel methods or use tensor factorization to project your data into a latent feature space where you could use standard 2d methods. If you're into the current deep learning fashion, you could also use a neural network to extract features that you can use with a more standard machine learning method.

ADD REPLYlink written 5 months ago by Jean-Karim Heriche20k

Out of interest, Jean-Karim, if you are working in this area, which programs / resources are you using?

ADD REPLYlink written 5 months ago by Kevin Blighe45k

I assume the area is tensors not deep learning. For this, I am using R with package rTensor as base for my own functions (e.g. tensor ridge regression). There's also the nnTensor package for non-negative factorizations.

ADD REPLYlink written 5 months ago by Jean-Karim Heriche20k

Just about the bedtools, try -b fileA.bed,fileB.bed

ADD REPLYlink modified 5 months ago • written 5 months ago by geek_y9.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 930 users visited in the last hour