DESeq2 with subset of genes
1
0
Entering edit mode
4.0 years ago

Dear All,

I have RNAseq data of a hybrid yeast, which has a lot of gene conversion and loss of heterozygosity between two genomes. I also have RNAseq data of one of its parents.

I was able to phase only 300 genes out of 6000. What I want is to compare gene expression levels between hybrid and this parent. Since only these 300 genes are phased, I got only 2% of uniquely mapped reads in hybrid, while in parent there are around 90%.

So my question is whether it is legitimate to use DESeq2 for only this subset of 300 genes? I am wondering whether it is ok to compare such a different library sizes together.

Thanks,

RNA-Seq DESeq2 hybrid • 1.9k views
0
Entering edit mode

With my experience, I would say, you may run into some normalization problems. May be you can try ANOVA kind of test(?). But somebody here who has better experience with DESeq2 should comment on your situation.

0
Entering edit mode

Hi Venu, you right, my gut feeling says that conceptually it might not feet to DESeq2. Regarding ANOVA, what do you suggest exactly? Thanks

0
Entering edit mode

Do you really need to phase the alignments to do this? If the two parents are quite similar I would think it'd be better to align to one genome (or use an allele-specific pipeline, ignoring the fact that you don't actually care about allele-specific expression) and use the counts from all of the genes.

0
Entering edit mode

Hi Devon, allele-specific expression is exactly what I had to do, that's why I phased genes :) So now I want to compare the parent and homeolog. Here we have a quite complex genome and usual allele-specific pipelines fail, since >70% of genome has undergone conversion and LOH.

0
Entering edit mode

Those 2% vs 90% aligned reads, are those referring to the entire genome or to the specific 300 genes? If the 300 genes of interest are similarly well covered, it may be feasible. You could use the standard kallisto/salmon - tximport - DESeq2 routine just using those 300 genes. At least technically, that should be doable.

0
Entering edit mode

2% to 300 genes (the rest are multimaps), and 90 refers to entire genome. I used used STAR-DESeq2 pipeline.

What I want to try is mapping the parent only to subset of these 300 genes and then use DESeq2.

0
Entering edit mode

In the parental line, when you map to the entire genome, what is the mapping rate on the subset of 300 genes? Could you maybe clarify a bit how you mapped? Ex:

Parental ==> Mapped to entire genome ==> 90% uniquely mapping
Hybrid ==> Mapped to entire genome ==> not working too much noise
Parental ==> Mapped to subset (300 genes) ==> ???% uniquely mapping
Hybrid ==> Mapped to subset (300 genes) ==> 2% uniquely mapping

0
Entering edit mode
Parental-> mapped to parental genome -> 90% unique maps
Hybrid-> mapped to phased genome -> 2.5 % unique maps
Parent-> mapped to subset (300 genes) -> 2.9 %


So I think I will subset these 300 genes from the whole genome mapping, and will normalize the library size only based on these genes and will compare it with hybrid.

1
Entering edit mode
4.0 years ago

you should probably map both parental and hybrid samples to genome of similar sizes, i.e., if you're going to focus on those 300 genes, then use those for the parental strain, too.

disclaimer: I don't think I've understood all the details of your project.