Question

Combining data from single- and paired-end sequencing done on same samples

0

Entering edit mode

12 months ago

tise.suz ▴ 10

Hi everyone,

This is my first post in this community as I'm still fairly new to bioinformatics. Please forgive me if I make any mistakes.

My current issue is in regards to reads of the same samples not matching for single- versus paired-end reads. PCA of Samples That is, when I perform differential gene expression analysis on the count matrix generated by each run, I get many genes with opposite expression trends. I was advised by our genomics center to just merge the count matrices; however, I do not feel comfortable doing so given that the PCA shows separation of the samples and the single-end library has more reads than the paired-end. I know that I would have to at least perform some type of batch correction since the runs were done separately. So, I would really appreciate any insight on how best to troubleshoot this issue.

bulk Illumina RNA-seq System Ovation single-end NextSeq 550 paired-end Universal • 1.1k views

ADD COMMENT • link 12 months ago by tise.suz ▴ 10

score 3 · Accepted Answer · 2023-04-27

3

Entering edit mode

12 months ago

rpolicastro 13k

If you want to avoid the headache of batch correction you could consider only the R1 read when aligning/quantifying the paired end data.

ADD COMMENT • link 12 months ago by rpolicastro 13k

0

Entering edit mode

That having said, were the samples processed equally and only the sequencing mode differs or are there any other sources of potential batch effects?

ADD REPLY • link 12 months ago by ATpoint 82k

0

Entering edit mode

Thank you, rpolicastro! I will try to run the paired-end with only the R1 reads.

ATpoint I extracted the RNA samples, but the genomics core did the library preparation for single- and paired-end, so I'm not sure whether anything else was different. I'll ask them :)

ADD REPLY • link 12 months ago by tise.suz ▴ 10

0

Entering edit mode

Ask them if all samples were prepared the same. SE/PE is the same library, it's just a different mode on the sequencer.

ADD REPLY • link 12 months ago by ATpoint 82k

0

Entering edit mode

I'm guessing they were not prepared the same way. Here's the updated graph: enter image description here

The R1 samples are the R1 from the paired reads.

ADD REPLY • link 12 months ago by tise.suz ▴ 10

0

Entering edit mode

It does look like there was potentially some additional batch on top of the run type.

Mind remaking the plot but excluding the paired-end data when running PCA? Assuming CR and SE refer to your conditions it might be possible to reduce this batch effect via a covariate in your model.

You should also start investigating what could have caused this batch. Is there some difference in sample prep this could be attributed to such as a different RNA purification kit, or read length? Or perhaps it's something more subtle like the samples being prepared at different times.

ADD REPLY • link 12 months ago by rpolicastro 13k

1

Entering edit mode

I'm sorry for the late reply. A lot has happened in the past few days. It turns out the center mislabeled the single-end samples... Also, the read lengths are different between paired-end and single-end. Thank you!

ADD REPLY • link 12 months ago by tise.suz ▴ 10