Question: How much paired end vs single end RNA-seq reads can have influnce on expression level quantification ?
2
gravatar for jack
3.9 years ago by
jack750
Germany
jack750 wrote:

I have RNA-seq data experiment which have done for some biological samples.

I need to do the gene expression quantification using these read files and after that differential gene expression analysis.

But RNA-seq data which I have is bit unusual in sense of experiment design.  I'm wondering how much this can have effect on differential gene expression analysis results.

The problem with RNA-seq data is that for condition A, the reads are paired end and for condition B the reads are  single end. Moreover for some biological replicates of condition D, it's single end and for some others replicate of it are paired paired end. All the data come from same lab and same sequencing platform.

I need to do DEG analysis A vs. B , A vs. C and C vs. B

Does anyone have idea how much this can effect on the result of expression level quantification and DEG analysis ?

ADD COMMENTlink modified 3.9 years ago by h.mon24k • written 3.9 years ago by jack750
3

You don't say, but did they also use the same library prep protocol? If not, you're sunk.

ADD REPLYlink written 3.9 years ago by Michele Busby1.9k
2

One simple solution would be to use just single end reads i.e. discard the paired-data. For GEX (as opposed to transcript-level/isoform analysis) there is plenty of data to suggest that 1x50bp data is sufficient.

 

ADD REPLYlink written 3.9 years ago by scottbrouilette50
1
gravatar for Devon Ryan
3.9 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

It's not ideal, but if you ensure that D is included in the input counts then you can model and compensate for the effect of paired vs. single-end. You could also use combat (from the SVA package).

For how much of an effect you'll see it's always a bit hard to predict. I think in the DESeq (or DESeq2) they use an example dataset that has both SE and PE samples to show how to use more complex models. There I recall that you definitely get a difference due to the sequencing type alone. So definitely don't read to much into an A vs. B comparison if you can't compensate for the PE vs. SE difference.

ADD COMMENTlink written 3.9 years ago by Devon Ryan88k
1
gravatar for Irsan
3.9 years ago by
Irsan6.8k
Amsterdam
Irsan6.8k wrote:

There is very minimal influence of read pairing when it comes to differential gene expression analysis(STAR + HTseq + edgeR/DEseq/...). For ~50 samples I had paired-end RNA-seq data and I analyzed the dataset in single-end and paired-end mode. The pearson correlation coefficient between single-end and paired-end of transcriptome-wide count data was >0.95 for all 50 samples. So my suggestion would be to run all your data in single-end mode, since you will have negliglible quality drop and you will be sure you can compare everything.

ADD COMMENTlink written 3.9 years ago by Irsan6.8k

How if, I use paired end for the smaples which I have paired end and single end for the samples which  I have single end. then is it comparable ?
 

ADD REPLYlink written 3.9 years ago by jack750

If you have no option to re-process the raw data and analyze all data in single-end mode then use the library type as a covariate as suggested by devon ryan

ADD REPLYlink written 3.9 years ago by Irsan6.8k
1
gravatar for h.mon
3.9 years ago by
h.mon24k
Brazil
h.mon24k wrote:

If you are just mapping to a reference genome / transcriptome, probably the biggest difference would be if you are trying to estimate "isoform" expression, as paired reads would have greater discrimination when mapping different isoforms. Anyway, I would use just PE1 for all samples, it is the simplest way to get rid of "obvious" bias. And, depending on the quality of your PE2 reads - which tends to be lower than PE1 and sometimes much worst than PE1 reads - you may get better results with just PE1.

P. S.: there are "non-obvious" bias, e.g., as single end and paired end use different insert sizes, you may have bias at this step of the proccess.

edit:

Thinking a bit more about it, your condition D could help you decide: you can compare your single and paired end samples and check if variability inside PE and SE is the same or different as between SE and PE.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by h.mon24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1163 users visited in the last hour