Question: Something strange in RNA-seq raw reads, It is contamination or not?
0
gravatar for kamoltip.lao
26 days ago by
kamoltip.lao0 wrote:

Hi all,

I'm a beginner for RNA-seq and bioinformatics. I found something strange in my RNA-seq raw read from an insect. I have 4 samples; A1, A2, B1, B2. Sample A1 - A2, and B1 - B2 are biological replicates.

After I ran FastQC to qualify the quality of the raw reads. I got the warning from 'overrepresented sequences' from sample B2. Then, I randomly picked 50,000 reads from each sample and ran blastn. The result shows that

No. of reads which are matched with my model in the database; A1: 90%, A2: 93%, B1: 86%, B2: 47%

No. of 'No hit'; A1: 9%, A2 : 4%, B1 : 12%, B2 : 51%

The rest are matched with something else.

From the result, sample B2 has some problem. I checked 100 reads which have 'No hit' status and I found that some part of these reads (20 - 70 bp from 101 bp) are matched 80-90% to bacteria, virus, or fungus sequences.

I understand that it might be endosymbiont but why these bacteria, virus, or fungus sequences are inserted in the middle of my reads and why its replicate (B1) doesn't have this character?

Can someone explain what happens with my sample B2? Is there contamination? or What should I do?

Thank you in advance for your kind help,

rna-seq • 162 views
ADD COMMENTlink modified 26 days ago by shawn.w.foley540 • written 26 days ago by kamoltip.lao0

Based on your description this does not quite sound like it is a bioinformatics issue.

Have you gone back and checked your experimental records to see if an obvious problem can may have been overlooked. You would want to focus on pre-library RNA QC and library QC. Since it only affects one of your biological replicates it does not seem to be a systematic issue with condition B. It is also possible that the sample just failed somewhere along the process.

ADD REPLYlink written 26 days ago by genomax67k

Thank you for your suggestion, I have checked the pre-library RNA QC and library QC reports but the quality of B2 is similar to other samples with RIN is 10. I prepared total RNA and sent it to the sequencing company to perform library preparation and sequencing for me. Is it possible that something happened during library preparation or sequencing?

ADD REPLYlink written 25 days ago by kamoltip.lao0

Is it possible that something happened during library preparation or sequencing?

If anything it would likely be during library prep. Start by contacting sequence provider, explain your results and go from there.

ADD REPLYlink written 25 days ago by genomax67k

I already contacted them but they said this issue is not contamination but they cannot explain why.

ADD REPLYlink written 25 days ago by kamoltip.lao0

Are those matches of B2 to ribosomal genes, by chance?

ADD REPLYlink written 26 days ago by WouterDeCoster38k

Yes, when I looked at 47% of reads that matched to my model, some reads are matched to ribosomal genes. The rest with 'No hit' status, I also found ribosomal genes of other organisms but just 20 -70 bp in the middle of 101 bp reads are matched. Have you ever experienced any issue like this?

Thank you for your reply

ADD REPLYlink written 25 days ago by kamoltip.lao0

but just 20 -70 bp in the middle of 101 bp reads are matched.

That sounds definitely odd. I would suggest contacting sequencing company and letting them know about your observations. See if they are willing to troubleshoot this with you.

ADD REPLYlink written 25 days ago by genomax67k
0
gravatar for shawn.w.foley
26 days ago by
shawn.w.foley540
USA
shawn.w.foley540 wrote:

From your description it sounds like you have some contaminating material in your sample B2. I don't know enough about the system to comment on how this happened, intuitively I'd guess that at some point in the wet lab bacteria/fungi were introduced. As genomax said, it doesn't look like a systematic error or phenotype so much as bad luck.

It would be worth performing the mapping and generating some PCA analyses to see how your samples cluster, it's possible that this contamination is accomplishing no more than reducing the effective sequencing depth. You can get some preliminary analyses using these samples, but repeating the experiment, or at the very least repeating the B sample, would be the best way to proceed. This could be a useful resource in house for hypothesis generation (with a big asterisk because of the contamination) but it would need to be repeated before publication.

ADD COMMENTlink written 26 days ago by shawn.w.foley540

I'll perform the PCA to see the pattern as your suggestion. I planned to prepare a new sample B, so It will be better if I know the possible reason that I can avoid next time.

Thank you so much

ADD REPLYlink written 25 days ago by kamoltip.lao0

If you just do B on its own is that not going to add batch effects that may be difficult to deal with?

ADD REPLYlink modified 25 days ago • written 25 days ago by genomax67k

It will be better if I perform sequencing for both conditions in the same batch, right? Just in case, I cannot collect more sample to run as replicate, only 1 sample for each condition, is it enough? I have a limitation with my sample, it's very difficult to collect. So, I could prepare only 2 replicates per condition in the first batch.

Thanks again for you help

ADD REPLYlink written 24 days ago by kamoltip.lao0

More replicates are always better, but you should be alright with repeating one rep for each condition. When you do the analysis just be sure to perform a batch correction so you can control for the fact that rep1 came from your first batch and rep2 came from the second.

ADD REPLYlink written 20 days ago by shawn.w.foley540
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1135 users visited in the last hour