Question: RNA-Seq replicates handling for prediction
0
gravatar for Tobias
3.9 years ago by
Tobias140
Tobias140 wrote:

Currently, I am trying to analyze the RNA-Seq (or other gene expression) data from approximately 1000 different samples, so I obtain a matrix of dimension 20000 (no. of genes) x 1000 (no. of samples), where each entry reflects the gene expression of gene i in sample j.

My aim is to predict the gene expression in each of these samples. For that purpose I would do a 10-fold CV on the samples, i.e., I split the samples in 10% chunks and try to predict the gene expression values for a particular sample by a model fitted on those 9 chunks of 10% samples in which the sample is not contained in.

Now there are several samples each that are replicates for one cell line. Hence it might by that I predict the gene expression values of a sample (cell line) by a model fitted on other samples where some of them are the replicates for the cell line.

Is such a thing conceptually correct or not? Additionally, it might be worth to add that the correlations between the replicate samples (over all the genes) are in a similar range as most other correlations between any two samples (0.4-0.9).

Many thanks for your help in advance!

sequencing rna-seq chip-seq R • 1.0k views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Tobias140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2334 users visited in the last hour