Question: Plant viruses sequences are found in human brain Rna-seq sample: how to evaluate it?
gravatar for Tao
4.4 years ago by
Tao370 wrote:

Hi guys,

I found some plant viruses in human brain sample through Rna-seq analysis. I know it's some kind of weird, but I do find some evidence that plant viruses can infect human. Even there is a very new review paper discussed "Can plant virus cross the kingdom border and be pathogenic to human". My question focus on how to evaluate the virus. The mapping analysis can only show some reads can be mapped to some virus. But "Does the virus exactly exist in my brain sample?" is still a unanswered question. Do anyone have great ideas to evaluate it? 

My little idea is to detect the fusion sites, but what if they didn't integrate into our genome?

My pipeline is as follows:

First, map RNA reads against human reference. Second, align unmapped reads against all viruses genomes. Third, extract those highly expressed viruses according to PPM or RPKM. 

For example, in one sample with about 60,000,000 unmapped reads(against human reference), there are 470,000 reads can be mapped to 'Poinsettia mosaic virus', which is a plant virus. And it seems not like contaminants because this virus are not detected in other samples.

Any advices are appreciated.



rna-seq viruses human brain • 1.8k views
ADD COMMENTlink modified 4.1 years ago • written 4.4 years ago by Tao370

Maybe edit the title? Poinsettia mosaic virus is in the family Tymoviridae.

ADD REPLYlink written 4.3 years ago by pld4.8k

Sorry for the late reply. I changed it to plant virus.

ADD REPLYlink written 4.1 years ago by Tao370
gravatar for reza.jabal
4.4 years ago by
New York, USA
reza.jabal370 wrote:

Hi Tao,

This is really interesting, but I remember while ago we had the same issue with our exome batch of human tissues which later turned out to be contamination. Before rushing into any conclusion make sure your sequencing service provider has not recently sequenced any plant samples as it might be the cause.

You can also chase up the other samples sequenced on the same plate to see if anyone else found the same viral genome.

ADD COMMENTlink modified 4.3 years ago • written 4.4 years ago by reza.jabal370

Thanks so much for your kind reply. I absolutely agree with your idea and I will be more careful about the contamination problem. Besides, I found several paper which might be helpful on this question. Please see my own comments. Thanks again.



ADD REPLYlink written 4.3 years ago by Tao370
gravatar for Tao
4.3 years ago by
Tao370 wrote:

Hi all, 

Thank you your review and answers. I will try to answer this question.

First, contamination. In this case, just as @reza.jabal mentioned, you should check if some plant sample had beed sequenced on same sequencer. Or check if other samples have this virus. 

Second, try to find the fusion site and that will be a strong evidence to support your discovery. Tools like ViralFusionSeq, VirusFinder, both published in 2013, might be helpful.

About the "unmapped reads" question, I found some paper discussing it which worth to read if you are interested.

Rapid identification of non-human sequences in high-throughput sequencing datasets.  Bioinformatics, 2012.

What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual. BMC genomics, 2015.

Welcome to add more supplementaries. Thanks!



ADD COMMENTlink written 4.3 years ago by Tao370

Id say there's not much of a reason to do any more analysis for now, this is a strong signal. As Reza said, your follow ups now should be to investigate potential contamination.

Your best bet would be to extract some fresh RNA and use PCR to see if you can detect the virus there. After that you should check the RNA used for sequencing as well as any cDNA library that might be remaining. This will help you determine if and when your samples were contaminated. Who did the RNA extractions and where?

You should also see if anyone did any library prep during/before your samples, or if there was any sequencing done on plant samples in parallel or before your samples.

If you're going to try go down the road of claiming a plant virus can infect humans you're going to need much, much more evidence. This one sample alone won't be enough.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by pld4.8k

Thanks for your nice suggestions Joe. As you said, I will be very careful about the potential contamination before claim it. In the past few weeks, I found some reads which seem contain fusion site . For example, for some paired reads, one end is mapped to human, and the other is mapped to the virus. And there are also some singletons(75bp), part of them are mapped to human and the other part are mapped to the virus. I double checked these mapping results using blast against nt database, which showed high specificity. What do you think about those chimeric reads or read pairs? Thanks! Tao.

ADD REPLYlink written 4.1 years ago by Tao370
gravatar for piet
4.3 years ago by
planet earth
piet1.7k wrote:

I recently uncovered a few contigs unambiguously matching Mumps virus in a bacterial DNA sequencing project (Illumina Hiseq 2500). Pure DNA free of RNA had been prepared from a single bacterial culture and send for sequencing. Mumps virus is an RNA virus, and a RNA isolation and reverse-transcription protocol would be required to get DNA for sequencing. Thus it was unlikely that contamination happened in our lab during bacterial growth and harvesting.

The reads mapping to Mump-virus contigs showed an average insert size of about 220 nt, while the average insert size of the bacterial reads was about 290 nt. This strongly indicates that the Mumps fragments are belonging to a different library, and that contamination presumably took place after library preparation.

About 700 read pairs were mapping to the Mumps virus from a total of about 7.5 million read pairs. It is really amazing to see how sensitive this deep sequencing is even for minor contaminates.

ADD COMMENTlink written 4.3 years ago by piet1.7k

Hi Piet, sorry for my late reply. It's a very interesting instance. I learned from you that I should also check the average insert size first. Actually, in our RNA-seq sample, we found a very strong virus signal, we even assembled its whole genome. And found some fusion-like read pairs which means one end read mapped to human and the other end mapped to virus. I'm still working on the project, if you have some ideas please feel free to tell me. Thanks! Tao.

ADD REPLYlink written 4.1 years ago by Tao370
gravatar for ablanchetcohen
4.1 years ago by
ablanchetcohen1.2k wrote:

This claim reminds me of the paper. "A Bacterium That Can Grow by Using Arsenic Instead of Phosphorus"

I am highly skeptical, but my skepticism sometimes prevents me from advancing in my scientific career. In another thread on this same forum, I took a paper published in Science on April 1st for an April's fool day, given its somewhat preposterous claims. "A programming language for living cells"

Still, your analysis pipeline is unconvincing to put it mildly. You make no mention of your tolerance for mismatches, insertions and deletions, or control of base quality.

Before even pretending that you may have made such an earth-shattering discovery, you should at the very least meet the following criteria.

  1. The bases of the aligned reads should be of excellent quality.
  2. The reads should be of reasonable length.
  3. There should be no tolerance for mismatches, insertions or deletions in the alignment.
  4. The sequences should be unique to the plant virus, and not found anywhere in the human genome.

If you still come to the same conclusions with all this extra validation, you still need further experimental results to validate this discovery worthy of a Nature paper. You could start with qPCR.

Somehow, I think you will find a flaw in your methodology before getting to publish your Nature paper, but I have been wrong before.

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by ablanchetcohen1.2k

Hi ablanchetcohen, thank you for your suggestions. Maybe the title I named led a misunderstanding, so I modified it again. Here, I'm not going to claim the finding of a plant virus in human brain. The fact is I found some virus reads, actually enough for assemble its whole genome, in one of our Rna-seq brain sample. And I'm not sure if it's contamination or real infection until now.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Tao370
gravatar for WouterDeCoster
4.1 years ago by
WouterDeCoster43k wrote:

I think you need a blank control or other sample of which you are absolutely sure that it doesn't contain any plantviral sequences. Next time you send samples for sequencing, include that negative control.

ADD COMMENTlink written 4.1 years ago by WouterDeCoster43k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1177 users visited in the last hour