Hello everyone. I am currently designing a pipeline which takes in RNA sequencing data and ideally will do a lot of the general pipeline operations (trim -> align -> differential expression). After trimming (which I did using trimmomatic), I end up with both paired end fastq files and (much smaller) unpaired fastq files.
I was wondering what the best way to proceed with these files is. So far, I've been trying to handle them separately. I aligned them separately using STAR and used RSEM separately for quantification. At this point, would you recommend that I merge the counts? is there an easy way to do this? Or is my whole thought process wrong. I'd appreciate any suggestions
I mainly kept them because of other biostars questions that seemed to suggest that you should keep them if you can, though given the added complexity it might just be better to discard them like you said. The biostars questions that I referenced are below:
Should I use unpaired reads from trimmomatic When is recommended to remove or keep Unpaired "Orphan" reads in downstream analysis?
Thank you for your suggestion
It is not standard practice to keep them, and they don't suggest that you should keep them (they're giving suggestions if you already go through the hassle of keeping them, in which case, for RSEM, I'd say merge the BAM files then run RSEM).
I would discard them. There's no need to keep that extra 1% of your reads unless you have some good reason to do so. You're already going to have literally millions (maybe billions) of reads in your dataset; I don't know what good an extra 1% of reads will do you.
Seconding all this. Unpairead reads after trimming are a neglectible fraction, and non-uniform processing rather adds uncertainty and adding value. Keep it simple and toss them.