I started from 40 fastq files for running the RNASeq Pipeline (Tophat - cufflinks - cuffmerge), then I divided the 40 tophat folders/files to 8 processes of running cuffdiff (each process has 5 Tophat input files, accepted_hits.bam) with the same merge.gtf (from cuffmerge). I now need to put the cuffdiff results in a big table so as to compare the fkpm values among the 40 samples. The name of my tophat files are s1, s2, s3, ..., s39, and s40. Could someone tell me how to merge the cuffdiff results? thanks
Question: How To Merge Cuffdiff Results To Make A Big Table?
0
newDNASeqer • 710 wrote:
ADD COMMENT
• link
•
modified 7.3 years ago
by
seidel ♦ 7.1k
•
written
7.3 years ago by
newDNASeqer • 710
1
seidel ♦ 7.1k wrote:
You could also aggregate your cuffdiff results fairly easily with R if you're familiar with it. Cuffdiff creates an output directory, so assuming directory names based on your tophat file names, the following would work to open files in 40 directories, grab out the column of interest, combine it to one table, and create an output file:
# declare a variable to hold fpkms
fpkm <- c()
# create directory names
dirnames <- paste("s", 1:40, sep=""))
# loop through all directories and grab fpkm columns
for( i in 1:length(dirnames) ){
fname <- paste(dirnames[i], "/genes.fpkm_tracking",sep="")
x <- read.table(file=fname, sep="\t", header=T, as.is=T)
fpkm <- cbind(fpkm, x[,"FPKM"])
}
# name the columns
colnames(fpkm) <- dirnames
# name the rows, they're all in the same order
rownames(fpkm) <- x[,1]
write.table(fpkm, file="fpkm.txt", sep="\t", col.names=NA)
Purists would say you don't need for loops in R, but the code would be more obscure.
0
Ashutosh Pandey ♦ 12k wrote:
Use the paste command in unix. It will concatenate files side by side. Then you can remove the redundant columns from the big file.
Please log in to add an answer.
Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.
Powered by Biostar
version 2.3.0
Traffic: 1494 users visited in the last hour
Which of the cuffdiff output files do you want to merge?
the file named gene_exp.diff
They should all have the same number of lines, so you can trivially script something in python/perl/whatever (I'm presuming that you just want the coordinates, gene name, and XLOC ID). BTW, is there a reason you're not just running one instance of cuffdiff with all of the files?