I have a database of 10 distinct biological samples. Each of these samples was sequenced (RNA-Seq) using paired-end reads and 6 barcodes. Thus, for each of the ten biological samples, I have 12 .fastq files, with names like *ATACTC_1, *ATACTC_2, *GTGCTC_1, *GTGCTC_2...and so on.
I would like to follow this pipeline:
- Analyze data quality with FastQC
- Trim data with Trim Galore!
- Align with TopHat2
- ??? generate counts, differential expression analysis, etc.
Here is my question: at what point in this pipeline can I (should I) combine all of the .fastq data together? If each sample has 12 files associate it, at what point do I collapse the 12 files into a single file representing the RNA-Seq data for a single biological sample that I can analyze for counts in step #4?
I'm guessing I combine all of the .fastq files up front (re-multiplex?) with >>cat file1...file12. I could also do steps #1-2 or steps #1-3 completely, and then combine the output of step #3.
Thank you for any help you can provide! This board has already been tremendously helpful to me.