I've got PE Illumina Hi-seq RNA-seq data. I trimmed the adapters and then aligned the reads to the reference genome. Now before proceeding towards transcript assembly and quantification, I would like to know how to screen down which of the three biological replicates should I take forward for further analysis?
The question is somewhat strange because it implies that you decided to use replicates with the outlook that 1 or more of them would fail(?) I'm not sure that I would spent a couple of hundred pound sterling GBP or ~!0,000 Rupee if I was later going to decide to ditch 1 or more of the samples.
If you've used a Hi-Seq and the laboratory personnel is experienced, then I imagine that you can any of the replicates.
Procedures that most people do with replicates:
- process them as separate samples and then, after normalization, check how they line up on PC1 vs. PC2 via principal components analysis
- average counts over the replicates post normalisation (this was more common in cDNA microarray analysis)
- concatenate the raw data FASTQ files together (
gzip) and then process them as a single sample
You mention assembly, so, I would concatenate your samples together and then do the de novo transcriptome assembly on the concatenated sample. Whilst saying this, all transcriptome assemblers that I've used allow you to specify multiple samples at the command line and then it merges them together anyway.
If any of the samples 'failed', I doubt that you'd have the data in hand right now. You should be able to confirm the basic quality of the samples by contacting the lab that did the sequencing, or just check the reports that they sent.
Good luck, Kevin