Yes, it is a good idea to map to both host and virus genome at the same time. It is more efficient and reads that can map to both viral and human genome will be identified and dealt with approprietely – for instance by setting the mapping quality to 0, which is usually interpreted as an "ambiguous" mapping.
But mapping on both genomes doesn't mean that you have to take all genes into account when normalizing. Regardless on how you map, you have at least three options to normalize the expression of the viral genes. Choosing one option over another will depend on what you can safely assume:
1. Normalize the viral genes on the viral genes: To do that, you have to assume that there is no global change in viral gene expression between your conditions. This might be possible if the two strains are not too different in the context of infection.
2. Normalize the viral genes on the human genes: You have to assume that viral charge is the same in the two conditions (which might have been experimentally controlled) and that there is no global change in the human gene expression between the two conditions.
3. Normalize the viral genes on both human and viral genes: You have to assume all of the above, or that any change in global gene expression in one species is balanced in the other species. I think that this is harder to assume so I would not recommend it.
(4. Normalize on a spike-in): You don't have to assume anything (except that the spike-in was properly mixed with the samples). But if your data is already generated and is "spike-in-less", then you have to choose another option.
modified 14 months ago
14 months ago by
Carlo Yague • 4.8k