Question: Virus sequence detected in my RNA-seq reads
gravatar for mfalco
3.3 years ago by
mfalco0 wrote:

Hi, I recently recieved the fastq files from a sequencer service. This was an experiment with human cell lines and along with the sequences they gave me the next warning:

"we detected the presence of Xenotropic murine leukemia virus sequences in some of the samples, resulting in a higher than expected percentage of no matches in the mapping statistics."

Do you think I should remove these sequences form my reads before alignment? If so, how can I do it?

Thank you

virus rna-seq alignment • 916 views
ADD COMMENTlink modified 3.3 years ago by WouterDeCoster44k • written 3.3 years ago by mfalco0

An easy way to identify those reads and put them aside is to add the sequence of XMLV as an extra chromosome in your reference file. You can create the index with it then map and have a look to the coverage of this contamination.

Before removing them, have a look in a browser (Ex: IGV) if they are really what the sequencer service said.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by VHahaut1.1k

Contamination of NGS data with sequences of unknown provenance is not an unknown. If you search PubMed you will find many reports. Detecting XMLV (a few reads) may be acceptable as opposed to massive contamination (what fraction of reads are XMLV?). Do you or anyone nearby work with mouse? If not the contamination could have originated at the sequence provider as well (if they made the libraries).

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by GenoMax93k

You could try to make the situation work in your favor. Try to see how those patients/cell-lines with virus and different from rest of samples in transcriptomics profile.

ADD REPLYlink written 3.3 years ago by Chirag Nepal2.3k
gravatar for WouterDeCoster
3.3 years ago by
WouterDeCoster44k wrote:

Those reads will most probably not influence your alignment and as such also won't have an influence on your downstream analysis.

But what is way worse is that this viral infection will influence your biological results. Your cells will behave differently when infected and will show a different transcriptomic profile, i.e. more antiviral genes will be expressed. So that's a problem.

Best you can do is identify which samples are affected and specify this as a covariate in your model for differential expression analysis.

ADD COMMENTlink written 3.3 years ago by WouterDeCoster44k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1050 users visited in the last hour