Question: Converting unstranded samples to strand specific
0
gravatar for lirongrossmann
21 months ago by
lirongrossmann20 wrote:

Hi All, I have a two datasets of rna-seq samples, one consists of strand-specific protocol (Truseq) and the other one unstranded (Clontech’s SMART). I would like to use both datasets (to increase the power of my study) and tried batch effect correction, but it did not go well (I still see two clear groups separated on pca according to the the protocol used). Is there a way to account for the difference between the protocol at the mapping/counting levels? My understanding is that the principle difference between the two sequencing techniques is that the unstranded will generate reads from both strands, even if one strand was actually expressed. Is there a way to get rid of the strands that were not expressed by using my strand dataset (assuming that strands that are not expressed in the strand dataset should not be expressed in the unstranded dataset as well)? Thanks a lot!

strand rna-seq • 891 views
ADD COMMENTlink modified 21 months ago by Friederike4.6k • written 21 months ago by lirongrossmann20
1
gravatar for Friederike
21 months ago by
Friederike4.6k
United States
Friederike4.6k wrote:

I think your title is a bit misleading - you're not trying to actually convert the sample type (which would be impossible since this must be happening at the time of the library preparation). If I understand you correctly, what you want is to filter reads from the unstranded data set based on information from the stranded dataset.

There are so many issues with that, it's hard to even get started. I am pretty sure you would introduce way more bias than trying to account for the fact that you used two different library preps.

First of all, I don't see how you can justify the assumption that " that strands that are not expressed in the strand dataset should not be expressed in the unstranded dataset as well". There are many reasons why you may not detect a transcript (e.g., you never captured it for the cDNA; it got degraded etc.) and the lack of expression is just one of them.

Secondly, you're dealing with randomly fragmented pieces! Just try to envision how you would match the different pieces from the different library preps. I'm not saying it's absolutely impossible, but it does not seem worth pursuing.

I'm sure there are many more details that make this task a rather undesirable one, but I hope these two points already illustrate the magnitude of the problem.

ADD COMMENTlink modified 21 months ago • written 21 months ago by Friederike4.6k

Thank you for the detailed answer. I agree with your comments. I may not have been explicitly clear about what I would like to achieve from the conversion. I built a model to predict groups based on their gene expression using the stran specific samples. I want to verify my model using the unstranded samples and some of the remaining stranded samples (I don't have many to begin with). I was hoping there is a way to compare the expression levels between the strand specific samples and the unstranded samples. Also, it's worth noting that my alignment algorithm was based on splice site orientation, so I was able to infer the strand for the unstranded reads. I know I may be losing a lot information (such as novel genes etc'...), but I am not trying to detect genes, just compare levels of expression for selected genes. Thanks

ADD REPLYlink modified 21 months ago • written 21 months ago by lirongrossmann20

At least for non-overlapping genes the TPM values should be comparable if the experimental conditions were the same. If you see great differences there, the issue is most likely not just due to the different library prep types.

ADD REPLYlink written 21 months ago by Friederike4.6k
0
gravatar for Devon Ryan
21 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

What you want to do is fundamentally impossible.

ADD COMMENTlink written 21 months ago by Devon Ryan91k

Will you please be able to briefly explain me why?

ADD REPLYlink written 21 months ago by lirongrossmann20

The only way to know which strand an unstranded fragment arose from would be to align it and, if it happens to align to a single gene, assume it arose from a given gene and not from the opposite strand. Since unstranded reads are slightly more prone to multimapping as is, you'll already be biased by that. That combined with the bias of assuming that antisense transcription never occurs will further compound the incorrectness of the results.

ADD REPLYlink written 21 months ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1185 users visited in the last hour