Hello, I have 75bp paired-end RNASeq data generated from Illumina HTSeq 2000 using the protocol of 7 samples mixture each lane from lane 1-7 in each flowcell. Each sample has 6bp-index associated with it. Using this protocal, for each sample, there are ~50 small .fastq.gz files for left-read and ~50 small .fastq.gz files for right-read. These small files are generated by the sequencer machine automatically. Now it comes up my questions regarding how to combine and keep the raw .fastq.gz files.
I used the command “cat” to combine these 50 small .fastq.gz files into one large .fastq.gz like the following for sample “2894” (is this the right way?) cat 2894_CCTTCA_L00_R1. .fastq.gz > 2894_R1.fastq.gz cat 2894_CCTTCA_L00_R2. .fastq.gz > 2894_R2.fastq.gz
After this, I have two .fastq.gz files for each sample. I think this is the files I want for analysis (TopHat), and also for uploading to public domain (SRA) when I publish my results.
However, the support staff in our sequencing core suggested that it is better to keep the original small .fastq.gz files for two reasons. 1. They are truly raw, that is to say, they are files generated automatically by the machine. 2. Bowtie2/tophat2 can take these small files as input directly.
Keep in mind that our RNASeq project is big, and we are not affording to keep both all small .fastq.gz files and the combined .fastq.gz files for each sample. So I would like to ask suggestions from you. If you can only keep one copy of the raw .fastq.gz files, which one you routinely keep for each sample:
the combined big .fastq.gz file or the original 50 small .fastq.gz files generated by the machine
Many thanks, Shirley