I am new to the bioinformatics world and due to circumstances I have been given the complete responsibility to perform a human transciptomics data-analysis without bioinformatic background and for now also without supervision.
This whole project feels like a big challenge, finding a puzzle piece every time.
It is a human transcriptomics analysis where we have QuantSeq data for 600 human patients with a certain condition which is similar in certain aspects but different in others. So we have 300 patients in one group and 300 patients in the other and all the data is from one time point. The data is preprocessed and I already have the unique read counts per sample in a table.
I have a couple of questions and I hope you guys can help me:
As far as I can tell we do not have biological replicates, I have the unique counts per sample for about 70K genes. So every column in my R dataframe corresponds with 1 patient (so no 2 columns per patient as is expected when you have biological replicates). Am I right to assume that we do not have biological replicates?
The steps I have taken so far are (using the EdgeR manual as a guide):
- Loading the dataset into R.
- Made a dataframe where the columns correspond with the samples and the rows with the genes.
- Made a DGE Object with the right condition per patient
- Filtered out lowly expressed genes with a raw count <10
- Performed normalization with the build in TMM-normalization method
Are the steps I did logical and did I miss something?
Also I would like to know how to proceed from here, I am expected to perform a differential expression and pathway analysis. But I get the sense that not having biological replicates might be a big problem, does anyone have tips on how to proceed?
Any help is greatly appreciated and earns you a digital cappuccino!