Hi All, I have a two datasets of rna-seq samples, one consists of strand-specific protocol (Truseq) and the other one unstranded (Clontech’s SMART). I would like to use both datasets (to increase the power of my study) and tried batch effect correction, but it did not go well (I still see two clear groups separated on pca according to the the protocol used). Is there a way to account for the difference between the protocol at the mapping/counting levels? My understanding is that the principle difference between the two sequencing techniques is that the unstranded will generate reads from both strands, even if one strand was actually expressed. Is there a way to get rid of the strands that were not expressed by using my strand dataset (assuming that strands that are not expressed in the strand dataset should not be expressed in the unstranded dataset as well)? Thanks a lot!
I think your title is a bit misleading - you're not trying to actually convert the sample type (which would be impossible since this must be happening at the time of the library preparation). If I understand you correctly, what you want is to filter reads from the unstranded data set based on information from the stranded dataset.
There are so many issues with that, it's hard to even get started. I am pretty sure you would introduce way more bias than trying to account for the fact that you used two different library preps.
First of all, I don't see how you can justify the assumption that " that strands that are not expressed in the strand dataset should not be expressed in the unstranded dataset as well". There are many reasons why you may not detect a transcript (e.g., you never captured it for the cDNA; it got degraded etc.) and the lack of expression is just one of them.
Secondly, you're dealing with randomly fragmented pieces! Just try to envision how you would match the different pieces from the different library preps. I'm not saying it's absolutely impossible, but it does not seem worth pursuing.
I'm sure there are many more details that make this task a rather undesirable one, but I hope these two points already illustrate the magnitude of the problem.