Question: How much paired end vs single end RNA-seq reads can have influnce on expression level quantification ?
gravatar for jack
4.8 years ago by
jack800 wrote:

I have RNA-seq data experiment which have done for some biological samples.

I need to do the gene expression quantification using these read files and after that differential gene expression analysis.

But RNA-seq data which I have is bit unusual in sense of experiment design.  I'm wondering how much this can have effect on differential gene expression analysis results.

The problem with RNA-seq data is that for condition A, the reads are paired end and for condition B the reads are  single end. Moreover for some biological replicates of condition D, it's single end and for some others replicate of it are paired paired end. All the data come from same lab and same sequencing platform.

I need to do DEG analysis A vs. B , A vs. C and C vs. B

Does anyone have idea how much this can effect on the result of expression level quantification and DEG analysis ?

ADD COMMENTlink modified 4.8 years ago by h.mon29k • written 4.8 years ago by jack800

You don't say, but did they also use the same library prep protocol? If not, you're sunk.

ADD REPLYlink written 4.8 years ago by Michele Busby2.1k

One simple solution would be to use just single end reads i.e. discard the paired-data. For GEX (as opposed to transcript-level/isoform analysis) there is plenty of data to suggest that 1x50bp data is sufficient.


ADD REPLYlink written 4.8 years ago by scottbrouilette50
gravatar for Devon Ryan
4.8 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

It's not ideal, but if you ensure that D is included in the input counts then you can model and compensate for the effect of paired vs. single-end. You could also use combat (from the SVA package).

For how much of an effect you'll see it's always a bit hard to predict. I think in the DESeq (or DESeq2) they use an example dataset that has both SE and PE samples to show how to use more complex models. There I recall that you definitely get a difference due to the sequencing type alone. So definitely don't read to much into an A vs. B comparison if you can't compensate for the PE vs. SE difference.

ADD COMMENTlink written 4.8 years ago by Devon Ryan94k
gravatar for Irsan
4.8 years ago by
Irsan7.1k wrote:

There is very minimal influence of read pairing when it comes to differential gene expression analysis(STAR + HTseq + edgeR/DEseq/...). For ~50 samples I had paired-end RNA-seq data and I analyzed the dataset in single-end and paired-end mode. The pearson correlation coefficient between single-end and paired-end of transcriptome-wide count data was >0.95 for all 50 samples. So my suggestion would be to run all your data in single-end mode, since you will have negliglible quality drop and you will be sure you can compare everything.

ADD COMMENTlink written 4.8 years ago by Irsan7.1k

How if, I use paired end for the smaples which I have paired end and single end for the samples which  I have single end. then is it comparable ?

ADD REPLYlink written 4.8 years ago by jack800

If you have no option to re-process the raw data and analyze all data in single-end mode then use the library type as a covariate as suggested by devon ryan

ADD REPLYlink written 4.8 years ago by Irsan7.1k
gravatar for h.mon
4.8 years ago by
h.mon29k wrote:

If you are just mapping to a reference genome / transcriptome, probably the biggest difference would be if you are trying to estimate "isoform" expression, as paired reads would have greater discrimination when mapping different isoforms. Anyway, I would use just PE1 for all samples, it is the simplest way to get rid of "obvious" bias. And, depending on the quality of your PE2 reads - which tends to be lower than PE1 and sometimes much worst than PE1 reads - you may get better results with just PE1.

P. S.: there are "non-obvious" bias, e.g., as single end and paired end use different insert sizes, you may have bias at this step of the proccess.


Thinking a bit more about it, your condition D could help you decide: you can compare your single and paired end samples and check if variability inside PE and SE is the same or different as between SE and PE.

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by h.mon29k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1632 users visited in the last hour