RNA Seq analysis of virally infected cells
1
2
Entering edit mode
5.5 years ago

Hi,

I am working with a virally infected cell line. My RNA Seq reads are therefore a combination of host and viral transcripts. I am interested in the viral transcripts that are differentially regulated between the WT and a mutant cell line. I am not sure about the best way to analyze this data.

1. Should I align the reads to a combined viral+host genome and then count the reads independently for the virus and host using their respective GTF files and then combine the counts before using as input for deseq2?
2. Or if it is better to align the reads to the virus and then align the unmapped reads to the host and proceed as before.

Also, is deseq2 the best way to dge in small genomes or is there a better way to normalize the raw counts before comparing fold change? I am new to RNA Seq and would appreciate any input!

Thank you!

RNA-Seq alignment • 2.3k views
0
Entering edit mode

If you're working with a RNA virus, remember you'll have to find a way to differentiate between genome (and any genomic intermediates) and viral mRNA.

0
Entering edit mode

It is a DNA virus. I dont think that will be a problem for me. But thanks for pointing it out

0
Entering edit mode
5.5 years ago

A variant of method 1 is your best bet. I would strongly recommend that you put the combined counts for the host and virus into DESeq2 at the same time. You'll use that for estimateSizeFactors() and can then extract only the counts for the viral genes for the actual DE part. This will better safe-guard you against missing a global change in viral expression due to normalization.

0
Entering edit mode

I am not sure I understand. When you say variant, do you mean at the level of DESeq 2 or prior to that?

Also, I am not sure I understand how to do the DE only for the viral genes when the counts include both viral + host. Can you help me with that?

Thanks!

0
Entering edit mode

"Variant" as in the common English language meaning of the word, nothing to do with what you're apparently thinking of.

You can subset a DESeqDataset object. If you only care about the viral genes, then just look at them but use the host genes to set the size factors.

0
Entering edit mode

Thanks! I will try that

0
Entering edit mode

I have a doubt about the method you have recommended for DeSeq2. Is there a way to estimate size factors using the entire data (host+viral) but then calculate differential expression for only the viral genes?

The way I am doing it right now is using the entire counts (host+viral) that I got from htSeq and used it to calculate differential expression for all the genes and then extracted out the data for just the viral genes.. Will the fold change be different between the 2 methods (if method 1 is possible at all).

0
Entering edit mode

Yes, you can just subset the DESeqDataSet after calculating the size factors. How you're currently doing this you're likely decreasing your statistical power.

0
Entering edit mode

I find this discussion very useful for my question. Can we use the same method to calculate DE of host genes? We have few virus infected sheep samples but each sample has different amount of virus present in tissue, so I am thinking it may effect on finding significant gene expression in infected sheep vs normal sheep. My idea is to use virus and host combined counts in DESeq2 and normalise them and then run DE part for sheep(host). Will it help? I am afraid how it will effect on normalising control samples where there is no virus present.

0
Entering edit mode

If you only want to test the host genes then just use them and come up with some sort of "viral infection level" metric (the number of mapped viral reads divided by 1000).