Hi all,
I had a bad luck that my mutant was contaminated by wt plants, ca. 20%. But I used those material for RNA-seq and obtained data already. Is there any good way to remove the wt contamination from the mutant data? Appreciate your help!
Hi all,
I had a bad luck that my mutant was contaminated by wt plants, ca. 20%. But I used those material for RNA-seq and obtained data already. Is there any good way to remove the wt contamination from the mutant data? Appreciate your help!
Let me outline the problem. You have 80% mutant and 20% WT in your sample. Assuming the simplest possible scenario, your data now has 20% of the reads from the mutant and 80% from the WT. This should be enough of a differerence to identify differences between WT and mutant if you already have pure WT data. You should be able to see enrichment of certain transcripts.
Frankly, the assumption of the composition might not hold at all. For instance, if your RNA isolation procedure works better in the mutant vs. WT or vice versa. I have seen this happen. In my most current work, this is certainly the case. You could perhaps get a handle of how variable the number of reads in a 20:80 mutant/WT library, using qPCR. On the other hand, if you are going to put work in to making several libraries, why not sequence?
If the questions you are interested in require you to try to identify which transcripts are present in the WT but not the mutant, obviously a transcript with a high level of support in your mixed sample, might have come from WT and not your mutant.
In general, cleaning or correcting your data requires knowing what is 'suppposed to' happen to begin with. In novel situations where you are performing a genuine experiment, one has no idea what the 'true' signal is supposed to look like. So manipulations are forever suspect.
Essentially, you should be able to get away with doing a comparative analysis between WT and your other sample that's essentially enriched for mutant transcripts. I don't think there is anything you can do about the contamination without making things worse.
If the goal is to compare mutant and wild type samples then the answer is most likely no, it is very unlikely that you could devise a method that would reliably distinguish between the signals.
Of course there is always the possibility that you could still get some good results but that would likely be far less reliable and would require a lot more effort than otherwise.
Just do it again this time paying attention to the process and use your first dataset as a sanity check for results obtained the second time around.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi George, thanks for your reply! My experiment includes untreated mutant (UM), untreated wt (UW), treated mutant (TM) and treated wt (TW). Can I do the calculation for each gene like this: Gene1 FPKM (UM) minus FPKM (UW)20%, Gene1 FPKM (TM) minus FPKM (TW)20%, then look for the fold change of Gene1 induced by treatment? And I'm not sure what do you mean about 'if you are going to put work in to making several libraries, why not sequence?'
I was trying to say that it might be helpful to estimate the amount of WT to mutant in your library using qPCR. If you have some of the library you sequenced you could do it that way. Otherwise if you don't have the original sample that you sequenced, you could make libraries with the same composition of your contaminated sample, do qPCR on them and get an estimate of how much RNA in your sample comes from WT.
I calculated my contamination by comparing the reads mapped to the region after T-DNA insertion of target gene from the RNA-seq data.
Let me just repeat, your calculation of "Gene1 FPKM (UM) minus FPKM (UW)20%" only makes sense if putting 20% of WT and 80% of mutant organisms in your original sample means you get 20% mutant RNA and 80% WT RNA. Do they both express total RNA at the same level? Do you know if it's equally easy to isolate RNA from one versus the other? Do they both amplify by PCR with equal efficiency? I usually work with DNA-seq. In my current work, if I mixed 20% of my control and 80% of my treatment group, I DO NOT get 20% reads from control and 80% from treatment.
I see your points. If I can not find a reliable method to filter the contamination, then I have to discard this part of data. Do it again is the best choice of course, but the decision will be made by my boss. >_