Question: WGCNA Difficulty in making clinical trait data file to match cluster-filter samples
0
gravatar for elizabethR
21 months ago by
elizabethR70
elizabethR70 wrote:

Hi

I would be very grateful for some help.

I am trying to use WGCNA to perform network analysis on TCGA RNASeq data.

I am at the Data Input and Cleaning stage, after using clustering to exclude outlying samples I am having difficulty making my clinical trait data to align and match with my RNASeq data. Computer says no but I don't understand why. Ive been following the r code prompts from https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-01-dataInput.pdf and also their r code chapter that I found through google in trying to troubleshoot this problem but code from neither works.

Let me describe my pipeline to date.

r set up and working directory set to where the data files are. I was advised to disableWGCNAThreads() at the beginning which I did (by typing "dsiableWGCNAThreads() ). Loaded read data as data, and clinical traits as traits. The file of data reads was formatted with all the gene names in the 1st column/A of my csv file, with TCGA patient IDs in the top row with patient data in columns as advocated by WGCNA handbook. The clinical trait data was formatted in the same way.

I transposed the datafile:

datExpr0=as.data.frame(t(data[, -c(1)]))

names(datExpr0) = data$GeneID
rownames(datExpr0) = names(data)[-c(1)]

I then checked for too many missing values with gsg:

gsg = goodSamplesGenes(datExpr0, verbose = 3);
gsg$allOK

No missing values. So far so good.

made my sample cluster tree, applied my cut off and kept the remaining cluster (removing 10 from my 41 samples). Told r to keep these:

keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)

Loaded trait data. This had 16 fields for the original 42 samples.

dim(traits)
[1] 16 42

There were no extra data columns that needed removing so formatting looked like this with no -c() command:

allTraits = traits[,]
allTraits = allTraits[, c(1, 2:42) ]
dim(allTraits)
[1] 16 42

But now I reach the point where I am meant to create a data frame for the clinical trait data to parallel the clustered data file that will only contain data for the 31 patients Ive filtered out by clustering and this is where it goes wrong:

# Form a data frame analogous to expression data that will hold the clinical traits. 

Samples = rownames(datExpr); 
traitRows = match(Samples, allTraits$Trait); 
datTraits = allTraits[traitRows, -1]; 
rownames(datTraits) = allTraits[traitRows, 1];

r cant do it and it says:

Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names':

It seems to think Im working on creating the same data frame rather than 2? How has it got this impression from the code? I tried reenabling the WGCNA threads in case I had removed its capacity to create 2 data frames simulatensouly by removing the files, typing enableWGCNAThreads() and repeating the above but it did not work.

Can anyone in the world help me with this?

Thank you very much Jude

rownames wgcna • 1.0k views
ADD COMMENTlink modified 21 months ago by Kevin Blighe49k • written 21 months ago by elizabethR70

I believe the error is just that you are attempting to set a vector of non-unique values as the rownames to a data-frame (datTraits), which is not permitted in R (note that it is permitted for a data-matrix).

What is the output of allTraits$Trait ? I believe this should merely be a vector of sample names / IDs that match those in the expression matrix used for network construction.

ADD REPLYlink modified 7 weeks ago • written 21 months ago by Kevin Blighe49k

That doesn't work either, all I have is a column of numbers from 1 to 31 (for number of samples) and an empty column next to it that says NA in it. I do not have sufficient coding knowledge to know how to fix this myself. I am perplexed that the r coding in the WGCNA handbook would be so redundant :(

ADD REPLYlink written 21 months ago by elizabethR70

I am perplexed that the r coding in the WGCNA handbook would be so redundant

That's a major issue generally, in bioinformatics.


Can you please paste the output of rownames(datExpr) and allTraits$Trait ?

ADD REPLYlink modified 7 weeks ago • written 21 months ago by Kevin Blighe49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1439 users visited in the last hour