Dear Biostars community,
I am hoping to get some advice and help on an analysis I am intending to perform. I have a bunch of patient samples RNA-seq fastq data which was made available through a joint collaboration with a hospital. Unfortunately, each patient within the study had one single tumour tissue sample sent for RNA-seq to generate one set of fastq per patient. As for healthy tissue we have one sample with three replicates.
My main question is how should I go about to perform downstream analysis once I have obtained the read counts from the aligned fastq files? I am familiar with doing cell line and xenograft-based RNA-seq analysis with 3 or more samples, as the raw read counts can be fed into a Star – Htseq counts – Deseq2 pipeline for differentially expressed genes. The DE genes can then be subjected to gene ontology or GSEA analysis, etc... But in the case where I have many patients with a single sample each, how do I go about performing meaningful downstream analysis even when true statistical significance cannot be determined?
On a side note, I am aware that EdgeR is able to handle single sample analysis by specifying dispersion coefficients, but I've seen many other forum posts suggesting that this method is subjected to user bias and other limitations.
Some questions I am thinking about are:
Should I just look at gene expression level on a gene-by-gene basis and not focus so much on statistical differential expression?
Can I group patients by age, cancer stage, other demographics… and look at variation in gene expression across these groups?
What kind of figures should I be looking towards from such an analysis? (probably no volcano plots since no DE genes).
Thank you in advance!
From an independent person (not one of the patients)? Or is there a "healthy" sample from each patient (desirable but highly unlikely).
It's from an independent donor (not one of the patients).
What question are you trying to answer with this dataset?