Question: Set control vs experiment for gage/pathview!
gravatar for Parham
6.3 years ago by
Parham1.5k wrote:

I am following GAGE workflow (1). The sample data is from ArrayExpress and it contains a set of control and a set of experiment (KD1 and KD2). I quote from the workflow "As an example, below is the code I used to map, index and process the read data for the first sample (ERR127302). You can write a shell script to do so for all samples (ERR127302-9)." (1).

The code is:

> tophat2 -p 8 -o tophat_out_1 ref/hg19 ERR127302_1.fastq.gz ERR127302_2.fastq.gz

If I include all files in the mapping script above, won't they just be shuffled all together?!!! I couldn't figure out how should I set in the pathway the control vs the experiment? Any help is appreciated.

I am a new user to Biostars, so I only have 5 posts per day. I may not be able to reply to all answers but I am reading your comments. 



1. RNA-Seq Data Pathway and Gene-set Analysis Workflows, Weijun Luo, February 6, 2014.


control vs experiment gage • 2.4k views
ADD COMMENTlink modified 6.3 years ago by Devon Ryan96k • written 6.3 years ago by Parham1.5k
gravatar for Devon Ryan
6.3 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

When Weijun Luo wrote that, he meant that the script would iterate over the samples, writing the output to different files, not that tophat2 should just be run once with all of the samples together (that would, indeed, make life difficult).

ADD COMMENTlink modified 8 months ago by RamRS30k • written 6.3 years ago by Devon Ryan96k

I see! Then how do I set control and experiment in the command lines? I cannot understand by just reading commands, since I am very new to this field. Should I rename the accepted hits files to control or experiment?! So many stupid questions I have!

ADD REPLYlink written 6.3 years ago by Parham1.5k

You needn't rename the BAM files that tophat produces, just use the -o option from tophat to put them in different directories. You end up specifying which samples belong to which group within R, when you run the gage() command (see section 3.4 of the workflow you're looking at). The idea is that you have normalized counts in a matrix (cnts.norm) with samples as columns. The column numbers for samples in the two groups are then provided as the ref= and samp= parameters.

ADD REPLYlink written 6.3 years ago by Devon Ryan96k

This line you mean?

cnts.kegg.p <- gage(cnts.norm, gsets =, ref = ref.idx, samp = samp.idx, compare ="unpaired")

So should I leave the ref.idx and samp.idx as they are or should I change them. The problem I am having is how does this matrix column numbers are set to ref and samp?

ADD REPLYlink modified 8 months ago by RamRS30k • written 6.3 years ago by Parham1.5k

Yes, that's the line. The actual indices for the samples will depend on how you make the matrices. In the workflow that you're looking at, the order of the samples is the order of the file names in fls in section 3.2. Personally, I'd use htseq-count or featureCounts and then just load the counts, but that's me. I'd also just export rlog2 counts from DESeq2 rather than bothering to do things manually, but that's rather outside the purview of that workflow.

ADD REPLYlink written 6.3 years ago by Devon Ryan96k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1883 users visited in the last hour