I have been working with my RNAseq output after running it thru the Deseq package in R. I have a list "genes.txt" of the genes in my 9 samples, the pvals, log2 fold change etc. I would like to compare lets say: Control vs. Knockdown and then use that input to use for GSEA.
Here is my phenotype file:
name type sample1 Control sample2 KD sample3 OE sample4 Control sample5 KD sample6 OE sample7 Control sample8 KD sample9 OE
Samples are grouped in 3s. So 1,2,3 are from the same cell line and so on for the rest.
This is the current code I am working with that provides the "global" values of all these samples, and Id like the output to take into account which values belong with which sample/condition(type). Is there a simplistic way to do this ? Id like to end up comparing the 3 controls vs the 3 KDs in GSEA as the next step.
Here is the code I currently have:
source("http://bioconductor.org/biocLite.R") biocLite("ballgown") biocLite library("DESeq2") countData <- as.matrix(read.csv("transcript_count_matrix.csv", row.names = "transcript_id")) colData <- read.csv("pheno_data", sep="\t", row.names = 1) all(rownames(colData) %in% colnames(countData)) countData<-countData[, rownames(colData)] all(rownames(colData) == colnames(countData)) #create deseq DataSet from count matrix and labels dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ type) #Run the default analysis for DESeq2 and generate results table dds <- DESeq(dds) res <- results(dds) #Sort by adjusted p-value and display (resOrdered <- res[order(res$padj), ]) write.table(resOrdered, file = "genes.txt", sep ="\t") #SORT BY CONDITION TYPE # READ INTO GSEA SECTION
Any guidance on this I would greatly appreciate!
Thanks so much,