Question

experimental design in RNA-seq differential expression analysis

0

Entering edit mode

3.3 years ago

soren.narges • 0

Hi, I am planning to run differential expression analysis and then enrichment analysis for an RNA-seq experiment from arabidopsis. The RNA-seq count table has TAIR tags/ids as rows. I was thinking that some of these tags are non-coding genes (e.g. micro RNA, etc) and so when converting TAIR ids to ENTREZ ids, around 80% are convertible (20% without associated entrez id). Since the purpose of the study is detecting DE genes and performing enrichment analysis (and for enrichment analysis most non-coding genes do not annotation available), is it wise to only keep the convertible tags and remove the rest (20% of features) from the beginning I mean before running DE analysis, etc?

RNA-Seq DE analysis experimnetal design • 795 views

ADD COMMENT • link updated 3.3 years ago by i.sudbery 19k • written 3.3 years ago by soren.narges • 0

score 2 · Accepted Answer · 2021-01-13

2

Entering edit mode

3.3 years ago

i.sudbery 19k

It probably won't make much of a difference either way. Including all the genes might give you a very slightly more accurate normalization and dispersion estimates, but will give you a slightly worst multiple testing burden. Such genes will be automatically ignored in any enrichment analysis anyway.

ADD COMMENT • link 3.3 years ago by i.sudbery 19k

0

Entering edit mode

Exactly, I was thinking of FDR values in both DE and enrichment analysis. Like, 20% of the background reference would be unmapped for enrichment analysis. But on the other hand the sample size is very small (3 per group) so I wanted to have more rows for data sharing while running DE analysis. But all in all, I am not very confident of the final results due to small sample size and was thinking to act properly in every steps.

ADD REPLY • link 3.3 years ago by soren.narges • 0

0

Entering edit mode

It would make no difference for the enrichment analysis because enrichment tools will filter the background automatically before they start.

ADD REPLY • link 3.3 years ago by i.sudbery 19k

0

Entering edit mode

Yes, I meant when correcting for multiple testing in DE analysis, all the tags are used and that affects the fdr values and final list (number) of DE genes. But when performing e.g. Fisher exact test in enrichment analysis, only 80% of the tags (around 20% are unmapped and removed) are used as reference distribution. I was just concerned if these two analysis steps must have same number of total features or not. But yes maybe it wont make much of difference. Thanks!

ADD REPLY • link 3.3 years ago by soren.narges • 0

0

Entering edit mode

Sorry if I'm being dense, but I'm confused. The fisher's test in enrichment analysis tags/tag counts/tag distributions arn't used, only the number of genes that are DE and the number of genes that are not DE, there is not reference distribution.

ADD REPLY • link 3.3 years ago by i.sudbery 19k

0

Entering edit mode

Sorry if haven't been clear. By reference distribution I meant total number of detected genes in the experiment (# DE + # non-DE genes).

ADD REPLY • link 3.3 years ago by soren.narges • 0

0

Entering edit mode

Okay.

So what I'm trying to say is, the background distribution will be the same whether you leave them in or not, because even if you leave them in the enrichment tool will automatically subtract them from the number of detected genes in the experiment (as well as the number of DE genes).

ADD REPLY • link 3.3 years ago by i.sudbery 19k