Hello,
I am very new to bioinformatics (and statistics) and have a basic question that I haven't been able to fully answer on my own.
I have been using R to do some basic analysis of RNA-seq data. I know that "tidy data" conventions say to organize my data by putting observations into rows and variables into columns. However, I am not sure which part of my data would be considered the "observations" and which are the "variables".
Below is R code to generate a line of data formatted similar to my dataset:
my_data <- data.frame(t(sample(20, 9)))
names(my_data) <- c("Subj_1_Cell_Type_X", "Subj_1_Cell_Type_Y", "Subj_1_Cell_Type_Z",
"Subj_2_Cell_Type_X", "Subj_2_Cell_Type_Y", "Subj_2_Cell_Type_Z",
"Subj_3_Cell_Type_X", "Subj_3_Cell_Type_Y", "Subj_3_Cell_Type_Z")
rownames(my_data) <- "Gene_ID"
In my data set, I have approximately 14,000 genes as rows and about 1,000 subject_cell_type columns.
I have left them in this format up until this point, however, I am not sure that this is correct. I have found many resources discussing the differences between observations and variables but for some reason the format of my data has left me unsure. I believe that I should transpose my data and consider the subject_cell_type as my observational units and the gene read counts as my variables.
Is this the correct interpretation?
Also, if anyone had any informative links on discerning between observations and variables, I would be really grateful!
Thank you!
Thank you for clearing that up for me! I'm relieved, as I was worried that the (limited) work I had already done was going to need to be scrapped because I forgot to check the structure of my data. I won't make this mistake again!
I will look into creating the columns that you mentioned also. Most of my confusion came from the combined subject/cell labels and splitting the two into distinct columns would definitely help me visualize and think about the data in a more intuitive way.
Hey, no problem! Also, I edited my answer because I counted the columns wrong. There will be 4, not 5 columns.