Question: How To Merge Cuffdiff Results To Make A Big Table?
gravatar for newDNASeqer
7.3 years ago by
United States
newDNASeqer710 wrote:

I started from 40 fastq files for running the RNASeq Pipeline (Tophat - cufflinks - cuffmerge), then I divided the 40 tophat folders/files to 8 processes of running cuffdiff (each process has 5 Tophat input files, accepted_hits.bam) with the same merge.gtf (from cuffmerge). I now need to put the cuffdiff results in a big table so as to compare the fkpm values among the 40 samples. The name of my tophat files are s1, s2, s3, ..., s39, and s40. Could someone tell me how to merge the cuffdiff results? thanks

cuffdiff • 3.3k views
ADD COMMENTlink modified 7.3 years ago by seidel7.1k • written 7.3 years ago by newDNASeqer710

Which of the cuffdiff output files do you want to merge?

ADD REPLYlink written 7.3 years ago by Devon Ryan98k

the file named gene_exp.diff

ADD REPLYlink written 7.3 years ago by newDNASeqer710

They should all have the same number of lines, so you can trivially script something in python/perl/whatever (I'm presuming that you just want the coordinates, gene name, and XLOC ID). BTW, is there a reason you're not just running one instance of cuffdiff with all of the files?

ADD REPLYlink written 7.3 years ago by Devon Ryan98k
gravatar for seidel
7.3 years ago by
United States
seidel7.1k wrote:

You could also aggregate your cuffdiff results fairly easily with R if you're familiar with it. Cuffdiff creates an output directory, so assuming directory names based on your tophat file names, the following would work to open files in 40 directories, grab out the column of interest, combine it to one table, and create an output file:

# declare a variable to hold fpkms
fpkm <- c()
# create directory names
dirnames <- paste("s", 1:40, sep=""))
# loop through all directories and grab fpkm columns
for( i in 1:length(dirnames) ){
  fname <- paste(dirnames[i], "/genes.fpkm_tracking",sep="")
  x <- read.table(file=fname, sep="\t", header=T,
  fpkm <- cbind(fpkm, x[,"FPKM"])
# name the columns
colnames(fpkm) <- dirnames
# name the rows, they're all in the same order
rownames(fpkm) <- x[,1]

write.table(fpkm, file="fpkm.txt", sep="\t", col.names=NA)

Purists would say you don't need for loops in R, but the code would be more obscure.

ADD COMMENTlink written 7.3 years ago by seidel7.1k
gravatar for Ashutosh Pandey
7.3 years ago by
Ashutosh Pandey12k wrote:

Use the paste command in unix. It will concatenate files side by side. Then you can remove the redundant columns from the big file.


ADD COMMENTlink written 7.3 years ago by Ashutosh Pandey12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1494 users visited in the last hour