Question: DESeq2 with subset of genes
0
gravatar for grant.hovhannisyan
24 months ago by
grant.hovhannisyan1.7k wrote:

Dear All,

I have RNAseq data of a hybrid yeast, which has a lot of gene conversion and loss of heterozygosity between two genomes. I also have RNAseq data of one of its parents.

I was able to phase only 300 genes out of 6000. What I want is to compare gene expression levels between hybrid and this parent. Since only these 300 genes are phased, I got only 2% of uniquely mapped reads in hybrid, while in parent there are around 90%.

So my question is whether it is legitimate to use DESeq2 for only this subset of 300 genes? I am wondering whether it is ok to compare such a different library sizes together.

Thanks,

rna-seq deseq2 hybrid • 1.2k views
ADD COMMENTlink written 24 months ago by grant.hovhannisyan1.7k

With my experience, I would say, you may run into some normalization problems. May be you can try ANOVA kind of test(?). But somebody here who has better experience with DESeq2 should comment on your situation.

ADD REPLYlink written 24 months ago by venu6.3k

Hi Venu, you right, my gut feeling says that conceptually it might not feet to DESeq2. Regarding ANOVA, what do you suggest exactly? Thanks

ADD REPLYlink written 24 months ago by grant.hovhannisyan1.7k

Do you really need to phase the alignments to do this? If the two parents are quite similar I would think it'd be better to align to one genome (or use an allele-specific pipeline, ignoring the fact that you don't actually care about allele-specific expression) and use the counts from all of the genes.

ADD REPLYlink modified 24 months ago • written 24 months ago by Devon Ryan92k

Hi Devon, allele-specific expression is exactly what I had to do, that's why I phased genes :) So now I want to compare the parent and homeolog. Here we have a quite complex genome and usual allele-specific pipelines fail, since >70% of genome has undergone conversion and LOH.

ADD REPLYlink written 24 months ago by grant.hovhannisyan1.7k

Those 2% vs 90% aligned reads, are those referring to the entire genome or to the specific 300 genes? If the 300 genes of interest are similarly well covered, it may be feasible. You could use the standard kallisto/salmon - tximport - DESeq2 routine just using those 300 genes. At least technically, that should be doable.

ADD REPLYlink written 24 months ago by Friederike5.2k

2% to 300 genes (the rest are multimaps), and 90 refers to entire genome. I used used STAR-DESeq2 pipeline.

What I want to try is mapping the parent only to subset of these 300 genes and then use DESeq2.

ADD REPLYlink written 24 months ago by grant.hovhannisyan1.7k

In the parental line, when you map to the entire genome, what is the mapping rate on the subset of 300 genes? Could you maybe clarify a bit how you mapped? Ex:

Parental ==> Mapped to entire genome ==> 90% uniquely mapping
Hybrid ==> Mapped to entire genome ==> not working too much noise
Parental ==> Mapped to subset (300 genes) ==> ???% uniquely mapping
Hybrid ==> Mapped to subset (300 genes) ==> 2% uniquely mapping
ADD REPLYlink written 24 months ago by VHahaut1.1k
Parental-> mapped to parental genome -> 90% unique maps
Hybrid-> mapped to phased genome -> 2.5 % unique maps
Parent-> mapped to subset (300 genes) -> 2.9 %

So I think I will subset these 300 genes from the whole genome mapping, and will normalize the library size only based on these genes and will compare it with hybrid.

ADD REPLYlink modified 24 months ago • written 24 months ago by grant.hovhannisyan1.7k
1
gravatar for Friederike
24 months ago by
Friederike5.2k
United States
Friederike5.2k wrote:

you should probably map both parental and hybrid samples to genome of similar sizes, i.e., if you're going to focus on those 300 genes, then use those for the parental strain, too.

disclaimer: I don't think I've understood all the details of your project.

ADD COMMENTlink written 24 months ago by Friederike5.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2011 users visited in the last hour