Question

Big BCV (Biological Coefficient of Variation) - no sense to continue the analysis of differential gene expression?

1

Entering edit mode

13 months ago

Ann ▴ 40

Is it possible to perform differential gene expression analysis on data with such dispersion, BCV and MDSplot? (Fig. A, B)

y_disp_design <- estimateDisp(y_filtered, design = design)

y_disp_design$common.dispersion

0.3251901
Is it possible to perform differential gene expression analysis on data with such experimental design?

I work with the data of a non-model invertebrate in which an particular organ (syncytial structure) develops in its tissues.

A - "normal" body tissues before organ development
B,C,D - 3 consecutive development stages of this organ

Each sample B,C,D contains contamination by "normal" tissues (sample A) as it was impossible to separate them.

It was assumed that the proportion of "normal" tissues (sample A) would be approximately the same in all samples, but, as I understand from the location of A samples on the MDS plot and the high values of BCV, this was not achieved.

The aim of the study was to identify some of signaling pathways involved in development of the organ of interest.

Are there any ways to analyze such data? Or the problems described above make any statistical analysis impossible?

My pipeline: Trimming -> Trinity -> CD-HIT -> TransRate -> Salmon -> tximport -> EdgeR

plotMDS_plotBCV

BCV differential RNA-seq EdgeR design • 1.1k views

ADD COMMENT • link 13 months ago by Ann ▴ 40

score 1 · Answer 1 · 2023-03-15

Variable levels of 'normal' contamination is a perennial issue in cancer tumor sequencing; and a number of methods have been developed to address the issue. You could try using something like IsoPureR (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0597-x) or ContamDE (https://academic.oup.com/bioinformatics/article/36/8/2492/5698700) to estimate 'Normal' and 'A' profiles, 'Normal' and 'B' profiles (etc) which would 'decontaminate' A, B, C. There are other methods; and you may need to provide marker genes for major cell types so profiles can be modeled as linear combinations of cell types; but these approaches should be applicable in this setting.

score 0 · Answer 2 · 2023-03-15

0

Entering edit mode

13 months ago

swbarnes2 14k

You hardly ever want to tell the people who paid for the experiment that these was "no point".

But it is awfully underpowered. 2 replicates a condition really isn't really enough, to say nothing of your other concerns.

ADD COMMENT • link 13 months ago by swbarnes2 14k

0

Entering edit mode

Initially there were three replicates for each sample, but I was forced to throw third replacations out of the analysis, because I had doubts that they were dissected in the same way as the others. Also they were sampled with a difference of a year relative to other replicates and clearly demonstrated a batch effect, and the BCV for this dataset was even higher

The experiment was planned and carried out without my participation, so now I work only with the data received. It seems to me that the most honest way would be to redirect the focus of research to the de novo transcriptome analysis (for example, comparison with transcriptomes from other species or something like that). However, I want to be sure that there is no way to do otherwise.

ADD REPLY • link 13 months ago by Ann ▴ 40