Question

Best way to handle samples with both single end and paired end?

0

Entering edit mode

2.9 years ago

ichbinlynn93 ▴ 30

Hi, I have an experiment with both single end (SE) and paired end (PE). They are: Control: 3 samples from SE (only 1 file for each sample), 6 from PE (two files, R1.fastq and R2.fastq for each sample) Treatment: 3 from SE, 5 from PE (similar to above).

I have seen people 1) using only R1 in the paired end; or 2) align them and DE analyze seperately for SE and PE then combine the results; or 3) align them into separate bam files but put these bam files in the same group when doing DE analysis such as cuffdiff.

So which one is better, or is there a better way to work on this kind to datasets?

Thanks!

RNA-Seq • 1.3k views

ADD COMMENT • link updated 2.9 years ago by ATpoint 81k • written 2.9 years ago by ichbinlynn93 ▴ 30

1

Entering edit mode

If SE and PE samples are biological replicates, that is, different samples, then one aligns all files separately, then, at the differential expression step, includes sequencing type as a batch effect in the analysis.

Are there other hidden batch effects? Were all libraries prepared simultaneously and with the same kit? And so on...

ADD REPLY • link 2.9 years ago by h.mon 35k

0

Entering edit mode

Thank you. For example, my control has 9 samples in total with 3 using SE and 6 using PE. And the sequencing depth for SE is nealy 25M while for PE is ~90M. For the SE, the variance of sequencing depth is very close but for PE they range from 75M to 92M. I am confirming with my manager on the prep kits for PE (The SE one was unstranded).

ADD REPLY • link 2.9 years ago by ichbinlynn93 ▴ 30

1

Entering edit mode

I would treat them all as single-end data. That makes sure you do not introduce any mapping bias. You never know what you might later need to do with the data in the project beyond DE analysis where you can regress out the SE/PE effect, maybe looking for some fusion transcripts based on alignments, idk, and then you would probably run into trouble if processing / alignment was not the same, and you would need to repeat alignment treating them all as SE and need to rerun all analysis that was done prior to that.

ADD REPLY • link 2.9 years ago by ATpoint 81k

0

Entering edit mode

Yes, this quite annoyed me. If I just need to treat them as single end, then I only need to process the R1.fastq file? Thanks!

ADD REPLY • link 2.9 years ago by ichbinlynn93 ▴ 30

0

Entering edit mode

Yes, for SE that would be R1 only. THe depth differences are probably neglectable as normalization will take care of that. Check by PCA whether after normalization samples are still confounded (so cluster by) by initial depth.

ADD REPLY • link 2.9 years ago by ATpoint 81k