Question: If I am right about fold change in RNA-seq
gravatar for A
4.4 years ago by
A3.8k wrote:

hi, I have treatments A, B and C and controls a, b and c for 33603 genes. my adviser asked me to find 'Fold change difference in expression level of ''each gene'' at ''each time point '' when treatments compared to control condition'

then I subtracted raw counts or FPKM for treatment A and control a and so on. If I got her mean properly???? I presented the cuffdiff output but she told I should provide something else but I can't get her. could you get her please?

rna-seq next-gen • 1.5k views
ADD COMMENTlink modified 4.4 years ago by Daniel3.8k • written 4.4 years ago by A3.8k
gravatar for Michael Dondrup
4.4 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

Fold change are not differences but fractions, I guess your supervisor meant you to compute fractions: A/a, B/b, C/c etc. Maybe you were thinking of microarray like M-values, but these are defined as the log of the fold change and can be computed as difference of logs. See Understanding up and down regulated genes from LOG2 foldchange or foldchange

Also, FPKM's have been heavily criticised, so why use them?

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Michael Dondrup47k

thank you, in cuffdiff output gene_exp.diff folder A with a, b and c, B with a and b and c and C with a and b and c were compared and I have fold change but instead of 33603 genes there are more than 160000 genes. and in this folder i have gene name instead of AGI then when i converted names to AGI because of duplicates I could not match the AGI with names, that is why I tried subtracting rowcounts

ADD REPLYlink written 4.4 years ago by A3.8k
  • which organism has 160k genes?
  • even if so I don't see what makes the difference between computing - vs /?
  • if the problem is to match rows, then you can use match() in R
  • if the genes were very different between samples, what would be the point in computing fold change?
ADD REPLYlink written 4.4 years ago by Michael Dondrup47k

sorry, my total genes are 33603 but as you know cuffdiff compute all of the possible combinations. for example A with a, a with B, B with b, b with C and C with c that is why in this folder I have 160000 genes. but In this folder I have gene name and she asked AGI then when I converted to AGI instead of 33000 genes for A versus a I have only 22000 because of duplicates then I got confused. she needs to know the expression level of A versus a and so on

ADD REPLYlink written 4.4 years ago by A3.8k

I highly recommend to clean up the disorder or whatever you would call this first and remove redundant gene entries. Is this maybe a de-novo transcriptome assembly with no reference genome? If not you should choose a method that will simply count or pseudo-count transcripts and doesn't give you more transcripts than there really are. Otherwise you might get trapped in a 'garbage in, garbage out' type of analysis. Btw.: maybe you supervisor should get involved with us directly?

ADD REPLYlink written 4.4 years ago by Michael Dondrup47k

thank you so much dear Michael, I am working with Arabidopsis with good reference genome. she is trying to make a challenge because I already presented her the significantly differential expressed genes but she asked something else. anyway I am thankful and I hope I provide what she asked.

ADD REPLYlink written 4.4 years ago by A3.8k
gravatar for Daniel
4.4 years ago by
Cardiff University
Daniel3.8k wrote:

Try out the HTSeq and EdgeR parts of what I have documented here. It's basic, but I think it works better than cufflinks (which I'm not a massive fan of), and you can get into more complicated stuff after.

ADD COMMENTlink written 4.4 years ago by Daniel3.8k

sorry, I have A, B and C treatment against a, b and c as control I used subhead to prepare the read counts table then I found this code


load table of read counts.

data <- as.matrix(read.table("countTable.txt"))

assign groups to the samples in the counts table

g <- c(0,0,0,1,1,1)

get the library sizes (total counts in annotated genes)

libSizes <- as.vector(colSums(data))

edgeR command pipeline (basically the same for each sample)

d <- DGEList(counts=data,group=g,lib.size=libSizes) d <- calcNormFactors(d) d <- estimateCommonDisp(d) d <- estimateTagwiseDisp(d) <- exactTest(d) results <-,n = length(data[,1]))

write the output to a text file


but in this part
# assign groups to the samples in the counts table g <- c(0,0,1,1) I could not figure out what i should type then I typed g <- c(0,0,0,1,1,1) result is like below head(mycounts[,1:4])

          logFC   logCPM       PValue          FDR

AT5G59310 -4.579619 5.592271 1.770797e-26 5.950233e-22 AT3G09640 -4.910870 1.084470 2.126015e-17 3.571918e-13 AT2G47180 -3.052163 4.195431 5.960477e-17 6.676132e-13 AT5G59720 -8.509985 8.755126 2.198775e-15 1.847081e-11 AT4G21320 -2.979331 4.810421 5.672696e-15 3.812278e-11 AT1G72660 -4.280324 2.181104 6.293873e-12 3.524779e-08

then what I did wrong because i need FC for all of my samples may you please consider what I did above and what I should type for g thank you

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by A3.8k

That looks like it's worked to me, what is the problem with these results?

ADD REPLYlink written 4.4 years ago by Daniel3.8k

Oh wait, you mean you want to do A <> a, B <> b, C <> c? in that case, make your sample descriptions more accurate, then do the 'exactTest' with the correct comparisons. But you can't only be doing 1 sample vs 1 sample right? you've got replicates of each?

ADD REPLYlink written 4.4 years ago by Daniel3.8k

For example, I load a 'samples.txt' file with the descriptions which looks like this:

sample  library_layout  condition       shortname       countf
ES1     PAIRED  16h-Light       ES1     ES1.count
ES2     PAIRED  16h-Dark        ES2     ES2.count
ES3     PAIRED  16h-Light       ES3     ES3.count
ES4     PAIRED  16h-Dark        ES4     ES4.count
ES5     PAIRED  16h-Light       ES5     ES5.count
ES6     PAIRED  16h-Dark        ES6     ES6.count

Then your DGElist command includes the column you want to assess:

d = DGEList(counts=counts, group=samples$condition)

Then do your exactTest with the comparison:

de = exactTest(d, pair=c("16h-Light", "16h-Dark"))

If you have multiple comparisons, do "de1, de2, de3" etc

ADD REPLYlink written 4.4 years ago by Daniel3.8k

sorry, you prepared your read counts with HTSeq but I extracted my read counts by subread and my file is like below


      Pri.2h Pri.8h Pri.24h unPri.2h unPri.8h unPri.24h

AT1G01010 257 232 309 280 231 64 AT1G01020 289 279 289 303 243 82 AT1G01030 74 31 81 72 44 54 AT1G01040 1253 1250 1612 1349 1069 572 AT1G01046 13 14 24 21 23 4 AT1G01050 1118 1178 1508 1341 1174 625

then my samples.txt how should be???

ADD REPLYlink written 4.4 years ago by A3.8k

Samples.txt in this example, is a file describing your samples, so you can choose to compare them. I am guessing, but I imagine yours would looks something like:

sample  time  condition
Pri.2h  2h  Pri
Pri.8h  8h  Pri
Pri.24h  24h  Pri
unPri.2h  2h  unPri
unPri.8h  8h  unPri
unPri.24h  24h  unPri

Then, your build would look like:

d1 = DGEList(counts=mycounts, group=samples$time)
d2 = DGEList(counts=mycounts, group=samples$condition)

and your tests like:

de2 = exactTest(d, pair=c("Pri", "unPri"))

If you're trying to do multi-group comparisons check out the ANOVA/glm parts of the manual (page 26 onwards: but I don't think you'd be to successful with only 2 reps.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Daniel3.8k

thank you telling de2 = exactTest(d, pair=c("Pri.24h", "Pri.2h")) Error in exactTest(d, pair = c("Pri.24h", "Pri.2h")) : At least one element of given pair is not a group. Groups are: Pri.24h Pri.2h Pri.8h unPri.24h unPri.2h unPri.8h

ADD REPLYlink written 4.4 years ago by A3.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 945 users visited in the last hour