How to handle paired and unpaired reads in RNAseq pipeline
1
0
Entering edit mode
8 days ago

Hello everyone. I am currently designing a pipeline which takes in RNA sequencing data and ideally will do a lot of the general pipeline operations (trim -> align -> differential expression). After trimming (which I did using trimmomatic), I end up with both paired end fastq files and (much smaller) unpaired fastq files.

I was wondering what the best way to proceed with these files is. So far, I've been trying to handle them separately. I aligned them separately using STAR and used RSEM separately for quantification. At this point, would you recommend that I merge the counts? is there an easy way to do this? Or is my whole thought process wrong. I'd appreciate any suggestions

rnaseq • 322 views
ADD COMMENT
2
Entering edit mode
8 days ago
dsull ★ 7.2k

You should just not use the unpaired FASTQ files… it’s like 1% of your reads; I don’t know why you’d care about preserving them. It adds unneeded complexity to pipelines.

ADD COMMENT
0
Entering edit mode

I mainly kept them because of other biostars questions that seemed to suggest that you should keep them if you can, though given the added complexity it might just be better to discard them like you said. The biostars questions that I referenced are below:

Should I use unpaired reads from trimmomatic When is recommended to remove or keep Unpaired "Orphan" reads in downstream analysis?

Thank you for your suggestion

ADD REPLY
2
Entering edit mode

It is not standard practice to keep them, and they don't suggest that you should keep them (they're giving suggestions if you already go through the hassle of keeping them, in which case, for RSEM, I'd say merge the BAM files then run RSEM).

I would discard them. There's no need to keep that extra 1% of your reads unless you have some good reason to do so. You're already going to have literally millions (maybe billions) of reads in your dataset; I don't know what good an extra 1% of reads will do you.

ADD REPLY
2
Entering edit mode

Seconding all this. Unpairead reads after trimming are a neglectible fraction, and non-uniform processing rather adds uncertainty and adding value. Keep it simple and toss them.

ADD REPLY

Login before adding your answer.

Traffic: 2502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6