Hey, I am doing RNA-seq data analysis for a particular complex disorder. I have paired sample for each patient (i.e. both the tumor and non-tumor are taken from same patient). The samples which i got are like:
Patient1_R1_L1_01.fq.gz, Patient1_R1_L1_02.fq.gz, Patient1_R1_L1_03.fq.gz
Patient1_R2_L1_01.fq.gz, Patient1_R2_L1_02.fq.gz, Patient1_R2_L1_03.fq.gz
Patient2_R1_L1_01.fq.gz, Patient2_R1_L1_02.fq.gz, Patient2_R1_L1_03.fq.gz, Patient2_R1_L1_04.fq.gz, Patient2_R1_L1_05.fq.gz Patient2_R2_L1_01.fq.gz, Patient2_R2_L1_02.fq.gz, Patient2_R2_L1_03.fq.gz , Patient2_R2_L1_04.fq.gz, Patient2_R2_L1_05.fq.gz
The problem is when I merge this sample using cat, the size of the concatenated file thus created varies for different patients (from ~2gb for one patient to ~9gb for other). I assume that difference in the file size will also be at the level of number of reads per sample.
So my question is how to go about it in such scenario? At what step and what kind of normalization should be performed to tackle this?
**P.S.: Please bear with me if i am asking too silly question. I am newby to this field