Question: Single vs paired ends in quantification
gravatar for mgoldste
9 months ago by
mgoldste0 wrote:

I ran RSEM-Ebseq on some data, with paired and unpaired options and got different numbers for differentially expressed genes. Why might this be?

rna-seq • 330 views
ADD COMMENTlink modified 5 months ago by Corentin310 • written 9 months ago by mgoldste0

Please post the commands used and explain in more detail what you did.

ADD REPLYlink written 9 months ago by h.mon26k

I took the paired end reads (separate files denoted R1 and R2) for each sample time (two time periods, 3 duplicates). I ran RSEM from a script file with the paired option (so the script file contained R1 and R2) and through a separate script ran R1 and R2 individually.

I then ran ebseq either using the paired reads RSEM-1.3.1/rsem-run-ebseq GeneMat.txt 3,3 GeneMat.results

or RSEM-1.3.1/rsem-run-ebseq GeneMat.txt 6,6 GeneMat.results for unpaired,

then /RSEM-1.3.1/rsem-control-fdr --hard-threshold GeneMat.results 0.05 GeneMat.de_hard.txt for both results files individually. The genes with fdr < 0.05 was significantly more for the unpaired reads set than the paired.

ADD REPLYlink written 9 months ago by mgoldste0

R1 and R2 reads came from the same cDNA / RNA fragment, so when you count R1 and R2 individually and sum the results, you are essentially artificially multiplying the read counts by two. What would this entails?

Imagine you toss a coin 20 times, obtaining 6 heads and 14 tails. Performing a goodness-of-fit Chi-Square tells you the coin is fair (lets use R for the test):

coinTosses <- c(6,14)
chisq.test( coinTosses, p=c(0.5,0.5) )
Chi-squared test for given probabilities

data:  coinTosses
X-squared = 3.2, df = 1, p-value = 0.07364

Now artificially multiply the values obtained by 2:

chisq.test( 2*coinTosses, p=c(0.5,0.5))
Chi-squared test for given probabilities

data:  2 * coinTosses
X-squared = 6.4, df = 1, p-value = 0.01141
ADD REPLYlink modified 9 months ago • written 9 months ago by h.mon26k
gravatar for kristoffer.vittingseerup
9 months ago by
European Union
kristoffer.vittingseerup2.0k wrote:

If you have paired data and run in unpaired mode it will quantify the reads from each file - meaning that for all pairs you will doublecount the fragment.

ADD COMMENTlink written 9 months ago by kristoffer.vittingseerup2.0k
gravatar for Corentin
5 months ago by
Corentin310 wrote:

You might be interested in this post:

The first time I compared raw reads counts to RSEM’s expected counts, I encountered an unexpected trend: the expected counts were not slightly lower than the raw counts, they were consistently lower by a factor of 2. After thinking about this a bit, I considered the possibility that RSEM treats each pair of reads as a single unit given paired-end data. To confirm this, I selected a small subset of 10 million read pairs from one of my samples and ran RSEM twice: once in paired-end mode, and once in single-end mode disregarding the read pairing information. Consistent with my hypothesis, the expected counts produced in single-end mode were almost exactly 2 times to the expected counts produced in paired-end mode.

ADD COMMENTlink written 5 months ago by Corentin310
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 701 users visited in the last hour