I just got my RNA seq data.
CZ1_S6_L001_R1_001.fastq
CZ1_S6_L002_R1_001.fastq
CZ1_S6_L003_R1_001.fastq
CZ1_S6_L004_R1_001.fastq
I only know there are from 4 different lanes during sequencing.
What are they and do I need to combine these fastq files into one? And how?
Thanks~~~
If I had paired-end data, I would combine all the forward read files together, and all the reverse read files together, for all lanes and all flow cells, am I right (assuming there are no batch effects due to lanes and flow cells)?
One concern I have is the index (barcode) sequence for the different flow cells are different. Will this affect my analysis? Thanks!
If it is the same sample (where multiple libraries were made using separate indexes) and then run on multiple lanes/FC's, you could (in theory), combine the R1 and R2 files. It would be much better to use
read groups
(http://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups ) to manage the aligned BAM files, keeping raw data separate.For obvious reasons one shouldn't combine data from unrealted samples (since that would defeat the purpose of indexing them in the first place).
Aling the four fastq separately and marge the bam files with samtools merge