Question: experimental design in RNA-seq differential expression analysis
0
gravatar for soren.narges
12 days ago by
soren.narges0 wrote:

Hi, I am planning to run differential expression analysis and then enrichment analysis for an RNA-seq experiment from arabidopsis. The RNA-seq count table has TAIR tags/ids as rows. I was thinking that some of these tags are non-coding genes (e.g. micro RNA, etc) and so when converting TAIR ids to ENTREZ ids, around 80% are convertible (20% without associated entrez id). Since the purpose of the study is detecting DE genes and performing enrichment analysis (and for enrichment analysis most non-coding genes do not annotation available), is it wise to only keep the convertible tags and remove the rest (20% of features) from the beginning I mean before running DE analysis, etc?

ADD COMMENTlink modified 12 days ago by i.sudbery10k • written 12 days ago by soren.narges0
2
gravatar for i.sudbery
12 days ago by
i.sudbery10k
Sheffield, UK
i.sudbery10k wrote:

It probably won't make much of a difference either way. Including all the genes might give you a very slightly more accurate normalization and dispersion estimates, but will give you a slightly worst multiple testing burden. Such genes will be automatically ignored in any enrichment analysis anyway.

ADD COMMENTlink written 12 days ago by i.sudbery10k

Exactly, I was thinking of FDR values in both DE and enrichment analysis. Like, 20% of the background reference would be unmapped for enrichment analysis. But on the other hand the sample size is very small (3 per group) so I wanted to have more rows for data sharing while running DE analysis. But all in all, I am not very confident of the final results due to small sample size and was thinking to act properly in every steps.

ADD REPLYlink written 12 days ago by soren.narges0

It would make no difference for the enrichment analysis because enrichment tools will filter the background automatically before they start.

ADD REPLYlink written 12 days ago by i.sudbery10k

Yes, I meant when correcting for multiple testing in DE analysis, all the tags are used and that affects the fdr values and final list (number) of DE genes. But when performing e.g. Fisher exact test in enrichment analysis, only 80% of the tags (around 20% are unmapped and removed) are used as reference distribution. I was just concerned if these two analysis steps must have same number of total features or not. But yes maybe it wont make much of difference. Thanks!

ADD REPLYlink modified 12 days ago • written 12 days ago by soren.narges0

Sorry if I'm being dense, but I'm confused. The fisher's test in enrichment analysis tags/tag counts/tag distributions arn't used, only the number of genes that are DE and the number of genes that are not DE, there is not reference distribution.

ADD REPLYlink written 12 days ago by i.sudbery10k

Sorry if haven't been clear. By reference distribution I meant total number of detected genes in the experiment (# DE + # non-DE genes).

ADD REPLYlink written 11 days ago by soren.narges0

Okay.

So what I'm trying to say is, the background distribution will be the same whether you leave them in or not, because even if you leave them in the enrichment tool will automatically subtract them from the number of detected genes in the experiment (as well as the number of DE genes).

ADD REPLYlink written 11 days ago by i.sudbery10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1854 users visited in the last hour
_