How To Merge Cuffdiff Results To Make A Big Table?
2
0
Entering edit mode
10.6 years ago
newDNASeqer ▴ 760

I started from 40 fastq files for running the RNASeq Pipeline (Tophat - cufflinks - cuffmerge), then I divided the 40 tophat folders/files to 8 processes of running cuffdiff (each process has 5 Tophat input files, accepted_hits.bam) with the same merge.gtf (from cuffmerge). I now need to put the cuffdiff results in a big table so as to compare the fkpm values among the 40 samples. The name of my tophat files are s1, s2, s3, ..., s39, and s40. Could someone tell me how to merge the cuffdiff results? thanks

cuffdiff • 4.3k views
ADD COMMENT
0
Entering edit mode

Which of the cuffdiff output files do you want to merge?

ADD REPLY
0
Entering edit mode

the file named gene_exp.diff

ADD REPLY
0
Entering edit mode

They should all have the same number of lines, so you can trivially script something in python/perl/whatever (I'm presuming that you just want the coordinates, gene name, and XLOC ID). BTW, is there a reason you're not just running one instance of cuffdiff with all of the files?

ADD REPLY
2
Entering edit mode
10.6 years ago
seidel 11k

You could also aggregate your cuffdiff results fairly easily with R if you're familiar with it. Cuffdiff creates an output directory, so assuming directory names based on your tophat file names, the following would work to open files in 40 directories, grab out the column of interest, combine it to one table, and create an output file:

# declare a variable to hold fpkms
fpkm <- c()
# create directory names
dirnames <- paste("s", 1:40, sep=""))
# loop through all directories and grab fpkm columns
for( i in 1:length(dirnames) ){
  fname <- paste(dirnames[i], "/genes.fpkm_tracking",sep="")
  x <- read.table(file=fname, sep="\t", header=T, as.is=T)
  fpkm <- cbind(fpkm, x[,"FPKM"])
}
# name the columns
colnames(fpkm) <- dirnames
# name the rows, they're all in the same order
rownames(fpkm) <- x[,1]

write.table(fpkm, file="fpkm.txt", sep="\t", col.names=NA)

Purists would say you don't need for loops in R, but the code would be more obscure.

ADD COMMENT
0
Entering edit mode
10.6 years ago

Use the paste command in unix. It will concatenate files side by side. Then you can remove the redundant columns from the big file.

See http://en.wikipedia.org/wiki/Paste_(Unix)

ADD COMMENT

Login before adding your answer.

Traffic: 2417 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6