I have fastqs of samples from the first sequencing and second sequencing runs and they are kept in different directories like below:
First Run:
Data1
|_____fastq_folder
|_______ sample1
|______ sample1_L1_R1.fastq.gz
|______ sample1_L1_R2.fastq.gz
|______ sample1_L2_R1.fastq.gz
|______ sample1_L2_R2.fastq.gz
|_______ sample2
|______ sample2_L1_R1.fastq.gz
|______ sample2_L1_R2.fastq.gz
|______ sample2_L2_R1.fastq.gz
|______ sample2_L2_R2.fastq.gz
Second Run:
Data2
|_____fastq_folder
|_______ sample1
|______ sample1_L1_R1.fastq.gz
|______ sample1_L1_R2.fastq.gz
|_______ sample2
|______ sample2_L1_R1.fastq.gz
|______ sample2_L1_R2.fastq.gz
Usually, when I want to run Salmon or Kallisto on First Run files which are in the directory Data1 in my script I give it like the below:
Let's say I'm inside directory Data1 where I have a script named kallisto.sh. Inside the script, I have it like below to read the fastq files.
r1=$(ls $fastq_folder/$sample/$sample*_R1.fastq.gz)
r2=$(ls $fastq_folder/$sample/$sample*_R2.fastq.gz)
But now I would like to also use Second Run files also in my script. How to make the change for r1 and r2 to read all the files in First Run and also Second Run?
P.S: I know there is a way to merge and then perform the analysis, but it might take huge time at my workplace.
in any case you should need to merge the data from Data1, they are run on different lanes but represent the same biological sample.
so something like
cat sample1_L1_R1.fastq.gz sample1_L2_R1.fastq.gz > sample1_R1.fastq.gz(== join the data from different lanes in to one file per biological sample)Yes, I know this. Please check the last line of my post. It might take a huge time at my workplace for merging, so I'm looking for alternative way.
You can use
findcommand with a certain depth like here: How to concatenate multiple fastq files (located in different directories) for each sample Issample1naming consistent across folders and files?Why would this take huge time? It will take up space since you will duplicate the data for some time.
Not sure if this is what you're asking, but if the runs represent the same biological sample, you can just put them one right after another in kallisto:
kallisto quant -i index.idx -o output/ run1.r1.fq.gz run1.f2.fq.gz run2.r1.fq.gz run2.r2.fq.gz run3.r1.fq.gz run3.r2.fq.gz