Question: Converting unstranded samples to strand specific
0
gravatar for lirongrossmann
2.1 years ago by
lirongrossmann20 wrote:

Hi All, I have a two datasets of rna-seq samples, one consists of strand-specific protocol (Truseq) and the other one unstranded (Clontech’s SMART). I would like to use both datasets (to increase the power of my study) and tried batch effect correction, but it did not go well (I still see two clear groups separated on pca according to the the protocol used). Is there a way to account for the difference between the protocol at the mapping/counting levels? My understanding is that the principle difference between the two sequencing techniques is that the unstranded will generate reads from both strands, even if one strand was actually expressed. Is there a way to get rid of the strands that were not expressed by using my strand dataset (assuming that strands that are not expressed in the strand dataset should not be expressed in the unstranded dataset as well)? Thanks a lot!

strand rna-seq • 1.0k views
ADD COMMENTlink modified 2.1 years ago by Friederike5.2k • written 2.1 years ago by lirongrossmann20
1
gravatar for Friederike
2.1 years ago by
Friederike5.2k
United States
Friederike5.2k wrote:

I think your title is a bit misleading - you're not trying to actually convert the sample type (which would be impossible since this must be happening at the time of the library preparation). If I understand you correctly, what you want is to filter reads from the unstranded data set based on information from the stranded dataset.

There are so many issues with that, it's hard to even get started. I am pretty sure you would introduce way more bias than trying to account for the fact that you used two different library preps.

First of all, I don't see how you can justify the assumption that " that strands that are not expressed in the strand dataset should not be expressed in the unstranded dataset as well". There are many reasons why you may not detect a transcript (e.g., you never captured it for the cDNA; it got degraded etc.) and the lack of expression is just one of them.

Secondly, you're dealing with randomly fragmented pieces! Just try to envision how you would match the different pieces from the different library preps. I'm not saying it's absolutely impossible, but it does not seem worth pursuing.

I'm sure there are many more details that make this task a rather undesirable one, but I hope these two points already illustrate the magnitude of the problem.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Friederike5.2k

Thank you for the detailed answer. I agree with your comments. I may not have been explicitly clear about what I would like to achieve from the conversion. I built a model to predict groups based on their gene expression using the stran specific samples. I want to verify my model using the unstranded samples and some of the remaining stranded samples (I don't have many to begin with). I was hoping there is a way to compare the expression levels between the strand specific samples and the unstranded samples. Also, it's worth noting that my alignment algorithm was based on splice site orientation, so I was able to infer the strand for the unstranded reads. I know I may be losing a lot information (such as novel genes etc'...), but I am not trying to detect genes, just compare levels of expression for selected genes. Thanks

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by lirongrossmann20

At least for non-overlapping genes the TPM values should be comparable if the experimental conditions were the same. If you see great differences there, the issue is most likely not just due to the different library prep types.

ADD REPLYlink written 2.1 years ago by Friederike5.2k
0
gravatar for Devon Ryan
2.1 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

What you want to do is fundamentally impossible.

ADD COMMENTlink written 2.1 years ago by Devon Ryan92k

Will you please be able to briefly explain me why?

ADD REPLYlink written 2.1 years ago by lirongrossmann20

The only way to know which strand an unstranded fragment arose from would be to align it and, if it happens to align to a single gene, assume it arose from a given gene and not from the opposite strand. Since unstranded reads are slightly more prone to multimapping as is, you'll already be biased by that. That combined with the bias of assuming that antisense transcription never occurs will further compound the incorrectness of the results.

ADD REPLYlink written 2.1 years ago by Devon Ryan92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1235 users visited in the last hour