Question: RNA Seq analysis of virally infected cells
gravatar for divya.nandakumar
2.2 years ago by
divya.nandakumar20 wrote:


I am working with a virally infected cell line. My RNA Seq reads are therefore a combination of host and viral transcripts. I am interested in the viral transcripts that are differentially regulated between the WT and a mutant cell line. I am not sure about the best way to analyze this data.

  1. Should I align the reads to a combined viral+host genome and then count the reads independently for the virus and host using their respective GTF files and then combine the counts before using as input for deseq2?
  2. Or if it is better to align the reads to the virus and then align the unmapped reads to the host and proceed as before.

Also, is deseq2 the best way to dge in small genomes or is there a better way to normalize the raw counts before comparing fold change? I am new to RNA Seq and would appreciate any input!

Thank you!

rna-seq alignment • 1.1k views
ADD COMMENTlink modified 2.2 years ago by Devon Ryan89k • written 2.2 years ago by divya.nandakumar20

If you're working with a RNA virus, remember you'll have to find a way to differentiate between genome (and any genomic intermediates) and viral mRNA.

ADD REPLYlink written 2.2 years ago by pld4.8k

It is a DNA virus. I dont think that will be a problem for me. But thanks for pointing it out

ADD REPLYlink written 2.2 years ago by divya.nandakumar20
gravatar for Devon Ryan
2.2 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

A variant of method 1 is your best bet. I would strongly recommend that you put the combined counts for the host and virus into DESeq2 at the same time. You'll use that for estimateSizeFactors() and can then extract only the counts for the viral genes for the actual DE part. This will better safe-guard you against missing a global change in viral expression due to normalization.

ADD COMMENTlink written 2.2 years ago by Devon Ryan89k

I am not sure I understand. When you say variant, do you mean at the level of DESeq 2 or prior to that?

Also, I am not sure I understand how to do the DE only for the viral genes when the counts include both viral + host. Can you help me with that?


ADD REPLYlink written 2.2 years ago by divya.nandakumar20

"Variant" as in the common English language meaning of the word, nothing to do with what you're apparently thinking of.

You can subset a DESeqDataset object. If you only care about the viral genes, then just look at them but use the host genes to set the size factors.

ADD REPLYlink written 2.2 years ago by Devon Ryan89k

Thanks! I will try that

ADD REPLYlink written 2.2 years ago by divya.nandakumar20

I have a doubt about the method you have recommended for DeSeq2. Is there a way to estimate size factors using the entire data (host+viral) but then calculate differential expression for only the viral genes?

The way I am doing it right now is using the entire counts (host+viral) that I got from htSeq and used it to calculate differential expression for all the genes and then extracted out the data for just the viral genes.. Will the fold change be different between the 2 methods (if method 1 is possible at all).

Thank you for your help!

ADD REPLYlink written 2.2 years ago by divya.nandakumar20

Yes, you can just subset the DESeqDataSet after calculating the size factors. How you're currently doing this you're likely decreasing your statistical power.

ADD REPLYlink written 2.2 years ago by Devon Ryan89k

I find this discussion very useful for my question. Can we use the same method to calculate DE of host genes? We have few virus infected sheep samples but each sample has different amount of virus present in tissue, so I am thinking it may effect on finding significant gene expression in infected sheep vs normal sheep. My idea is to use virus and host combined counts in DESeq2 and normalise them and then run DE part for sheep(host). Will it help? I am afraid how it will effect on normalising control samples where there is no virus present.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Deepali Vasoya10

If you only want to test the host genes then just use them and come up with some sort of "viral infection level" metric (the number of mapped viral reads divided by 1000).

ADD REPLYlink written 5 weeks ago by Devon Ryan89k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1855 users visited in the last hour