Hello
I am using kallisto | bustools workflow for single-cell RNA-seq analysis. In their workflow (https://www.kallistobus.tools/multiple_files_tutorial.html), they executed the below code to run kallisto bus command for four pair of fastq files as below:
$ kallisto bus -i Mus_musculus.GRCm38.cdna.all.idx -o bus_output/ -x 10xv2 -t 4 \
bamtofastq_S1_L001_R1_001.fastq.gz \
bamtofastq_S1_L001_R2_001.fastq.gz \
bamtofastq_S1_L002_R1_001.fastq.gz \
bamtofastq_S1_L002_R2_001.fastq.gz \
bamtofastq_S1_L003_R1_001.fastq.gz \
bamtofastq_S1_L003_R2_001.fastq.gz \
bamtofastq_S1_L004_R1_001.fastq.gz \
bamtofastq_S1_L004_R2_001.fastq.gz
However, as I have many fastq.gz files (around 800), it is tough to give all 800 files to kallisto bus command like the above approach. I really need to run all of them at once using a loop. So, I used a kallisto quant loop from a RNA-seq course to adapt it for kallisto bus command (for the same abovementioned paired samples) as below. However, It did not produce any output.
## Run kallisto:
# for four paired samples (-n 8):
find /scratch/fs/kallisto_bustools_multiple_lanes/fastqs -name "*_[R1R2]_001.fastq.gz" | sort | head -n 8 | while read
FW_READ
do
read RV_READ
FILEBASE=$(basename "${FW_READ%_R1_001.fastq.gz}")
kallisto bus -i /scratch/fs/kallisto_bustools_multiple_lanes/Mus_musculus.GRCm38.cdna.all.idx -x 10xv2 \
-o . -t 16 "$FW_READ" "$RV_READ"
# Kallisto doesn't let us specify an output filename so we rename all output files
mv "matrix.ec" $FILEBASE-"matrix.ec"
mv "output.bus" $FILEBASE-"output.bus"
mv "run_info.json" $FILEBASE-"run_info.json"
mv "transcripts.txt" $FILEBASE-"transcripts.txt"
done
#
May I kindly ask you to help me how to fix this chunk of code to run a loop for kallisto bus command for all of samples at once? Thank you so much.
Thank you for your guide. I am not sure whether I have included echo in a correct way or not (also in good locations or not), but, I tried as below after adding echo:
But, It produced this Error: read -bash: FW_READ: command not found
skip the echo before the find command, otherwise it won't read in the FV and RV_READ variables. Just echo the variable names and kallisto command. This should tell you the commands that will be run.
You can also use folders to place the output for each file.
Thanks a lot. I ran exactly as you suggested, however, it did not produce anything, even no error. It just resulted in linux command prompt sign ($).
That indicates that the find command is not finding anything. Check the path and filename regex, maybe "*_001.fastq.gz" might work bettere
Thanks. The result of running the code with "_001.fastq.gz" (instead of "_[R1R2]_001.fastq.gz") is as below:
Now, It is showing the correct fastq.gz files, but it did not do kallisto bus.
You can also make two directory for R1 and R2
Hoping it will help you.
Thank you very much for your help.
I also found another way of doing kallisto bus for many fastq files at once as below:
Thank you.