Question

How To Merge Cuffdiff Results To Make A Big Table?

0

Entering edit mode

11.8 years ago

newDNASeqer ▴ 790

I started from 40 fastq files for running the RNASeq Pipeline (Tophat - cufflinks - cuffmerge), then I divided the 40 tophat folders/files to 8 processes of running cuffdiff (each process has 5 Tophat input files, accepted_hits.bam) with the same merge.gtf (from cuffmerge). I now need to put the cuffdiff results in a big table so as to compare the fkpm values among the 40 samples. The name of my tophat files are s1, s2, s3, ..., s39, and s40. Could someone tell me how to merge the cuffdiff results? thanks

cuffdiff • 4.7k views

ADD COMMENT • link updated 11.8 years ago by seidel 11k • written 11.8 years ago by newDNASeqer ▴ 790

0

Entering edit mode

Which of the cuffdiff output files do you want to merge?

ADD REPLY • link 11.8 years ago by Devon Ryan 105k

0

Entering edit mode

the file named gene_exp.diff

ADD REPLY • link 11.8 years ago by newDNASeqer ▴ 790

0

Entering edit mode

They should all have the same number of lines, so you can trivially script something in python/perl/whatever (I'm presuming that you just want the coordinates, gene name, and XLOC ID). BTW, is there a reason you're not just running one instance of cuffdiff with all of the files?

ADD REPLY • link 11.8 years ago by Devon Ryan 105k

score 2 · Answer 1 · 2013-09-25

You could also aggregate your cuffdiff results fairly easily with R if you're familiar with it. Cuffdiff creates an output directory, so assuming directory names based on your tophat file names, the following would work to open files in 40 directories, grab out the column of interest, combine it to one table, and create an output file:

# declare a variable to hold fpkms
fpkm <- c()
# create directory names
dirnames <- paste("s", 1:40, sep=""))
# loop through all directories and grab fpkm columns
for( i in 1:length(dirnames) ){
  fname <- paste(dirnames[i], "/genes.fpkm_tracking",sep="")
  x <- read.table(file=fname, sep="\t", header=T, as.is=T)
  fpkm <- cbind(fpkm, x[,"FPKM"])
}
# name the columns
colnames(fpkm) <- dirnames
# name the rows, they're all in the same order
rownames(fpkm) <- x[,1]

write.table(fpkm, file="fpkm.txt", sep="\t", col.names=NA)

Purists would say you don't need for loops in R, but the code would be more obscure.

score 0 · Answer 2 · 2013-09-23

0

Entering edit mode

11.8 years ago

Ashutosh Pandey 12k

Use the paste command in unix. It will concatenate files side by side. Then you can remove the redundant columns from the big file.

See http://en.wikipedia.org/wiki/Paste_(Unix)

ADD COMMENT • link 11.8 years ago by Ashutosh Pandey 12k