Question

How to calculate the difference between the RNA-Seq individual samples?

0

Entering edit mode

2.8 years ago

mohammedtoufiq91 ▴ 250

Hi,

I have ran RNA-Seq analysis by single sample per condition using edgeR package. Each sample output provided has log counts per million values (logCPM). After this, I performed antilog on the logCPM values (because another in-house pipeline does not support log normalized values, as it only supports normalized values). Now, using these normalized (CPM) values of each sample, I am comparing the individual sample versus baseline sample (CPM_778981). The below code shown is for comparing the individual sample expression value to the average of baseline samples. Currently, in my case I do not have replicate baseline samples to be averaged as I have only one. How do I modify the below code for one baseline sample.

df_raw = CPM.nor
df_raw = df_raw[,rownames(sample_info)]
colnames(df_raw) == rownames(sample_info)

# Difference
Diff.mod.ind.sin <- df_raw[,]
Diff.mod.ind.sin [,] <- NA

k=1
for (k in 1:nrow(df_raw)) {
  signature = rownames(df_raw)[k]
  test.table <- sample_info 
  test.table$scores <- df_raw[k,]
  T4 <- test.table
  T3 <- test.table[test.table$CPM_Comparison%in% c("CPM_778981_vs_778981"),]
  Diff.mod.ind.sin[k,] <- (T4$scores-(mean(T3$scores)))
}

Running the above code with one baseline sample, displays the below error message

Error in `$<-.data.frame`(`*tmp*`, "scores", value = list(CPM_778981_vs_778961 = 125.801134799492,  : 
  replacement has 1 row, data has 6

dput(head(Diff.mod.ind.sin))
structure(list(CPM_778981_vs_778961 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
), CPM_778981_vs_778971 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
), CPM_778981_vs_778981 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
), CPM_778981_vs_778991 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
), CPM_778981_vs_778951 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
), CPM_778981_vs_779001 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
)), row.names = c("M13.5_2-Sep", "M16.14_AACS", "M15.73_AAK1", 
"M15.21_AAMP", "M14.72_AARS", "M16.45_AARS2"), class = "data.frame")

dput(head(df_raw))
structure(list(CPM_778981_vs_778961 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
), CPM_778981_vs_778971 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
), CPM_778981_vs_778981 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
), CPM_778981_vs_778991 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
), CPM_778981_vs_778951 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
), CPM_778981_vs_779001 = c(125.801134799492, 13.4804549449251, 
223.255382390472, 71.5508182752866, 30.7773567291651, 2.8429842311326
)), row.names = c("M13.5_2-Sep", "M16.14_AACS", "M15.73_AAK1", 
"M15.21_AAMP", "M14.72_AARS", "M16.45_AARS2"), class = "data.frame")


dput(head(sample_info))
structure(list(CoreLabID = c(778961L, 778971L, 778981L, 778991L, 
778951L, 779001L), CPM_Comparison = c("CPM_778981_vs_778961", 
"CPM_778981_vs_778971", "CPM_778981_vs_778981", "CPM_778981_vs_778991", 
"CPM_778981_vs_778951", "CPM_778981_vs_779001")), row.names = c("CPM_778981_vs_778961", 
"CPM_778981_vs_778971", "CPM_778981_vs_778981", "CPM_778981_vs_778991", 
"CPM_778981_vs_778951", "CPM_778981_vs_779001"), class = "data.frame")

dput(head(test.table))
structure(list(CoreLabID = c(778961L, 778971L, 778981L, 778991L, 
778951L, 779001L), CPM_Comparison = c("CPM_778981_vs_778961", 
"CPM_778981_vs_778971", "CPM_778981_vs_778981", "CPM_778981_vs_778991", 
"CPM_778981_vs_778951", "CPM_778981_vs_779001")), row.names = c("CPM_778981_vs_778961", 
"CPM_778981_vs_778971", "CPM_778981_vs_778981", "CPM_778981_vs_778991", 
"CPM_778981_vs_778951", "CPM_778981_vs_779001"), class = "data.frame")

Thank you,
Toufiq

comparison expression edgeR replicates R • 723 views

ADD COMMENT • link 2.8 years ago by mohammedtoufiq91 ▴ 250

1

Entering edit mode

The error message is telling you that you're trying to assign data between things of incompatible dimensions. So you should work out your dimensions very carefully. Insert some statements to print out the dimensions of what you're trying to insert. The code above is incredibly hard to read. You might insert some comments explaining what or why various lines are there. It looks like you're trying to set a column of one thing with the row values of another. Don't wrap your head() calls in dput(), it makes things very confusing to read. You don't need to assign k=1, because that's already done in your for loop. I think you should try thinking through the problem again, as the above looks un-necessarily convoluted.