How to get different outputs for kallisto on multiple samples?
1
0
Entering edit mode
4.1 years ago
Biologist ▴ 290

I'm trying to run kallisto quant on multiple samples fastq.gz files. But in the output directory I see only one abundance.tsv file along with .json and .h5

#!/bin/bash

#SBATCH --job-name=Kallisto
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=4G
#SBATCH --time=1-00:00:00
#SBATCH --qos=1day
#SBATCH --output=kallisto_%A_%a.out
#SBATCH --error=kallisto_%A_%a.err 
#SBATCH --array=1-50%10


LBID=$(head -$SLURM_ARRAY_TASK_ID path/samples.txt | tail -1)

module load kallisto/0.43.1-goolf-1.7.20

dir="/usr/allSamples"
kallisto quant -i kallisto_GRCh38.p10_gencodev27.idx -o output --rf-stranded ${dir}/$LBID.1.fastq.gz ${dir}/$LBID.2.fastq.gz

So, I have all the samples fastq.gz files in /usr/allSamples directory. And the samples.txt looks like below.

samples.txt:

Sample100
Sample101
Sample103
Sample178

And the fastq.gz files in /usr/allSamples directory:

Sample100.1.fastq.gz
Sample100.2.fastq.gz
Sample101.1.fastq.gz
Sample101.2.fastq.gz
Sample103.1.fastq.gz
Sample103.2.fastq.gz
Sample178.1.fastq.gz
Sample178.2.fastq.gz

How to get the each abundance.tsv, .json and .h5 files for each sample separately and how to merge all the outputs into single file?

Any help is appreciated. thank you.

kallisto RNA-Seq geneexpression slurm sbatch • 3.8k views
ADD COMMENT
0
Entering edit mode
4.1 years ago

It looks like your code is going to overwrite the same output file every time you analyze a new set of fastqs. Make sure that the same name is specified in the output name, that way each one gets written to its own file.

ADD COMMENT
0
Entering edit mode

yes I know this, but not sure how can I specify same name for the output name? can you please tell me

ADD REPLY
1
Entering edit mode

You need to parametrise the value you pass to -o the same way as you parametrize the way you input the fastq files.

ADD REPLY
0
Entering edit mode

but -o is output directory with abundance.tsv, abundance.h5 and run_info.json output files. So, if I give $LBID in the place of output it will give Sample100, Sample101 directories with output files. Am I right?

ADD REPLY
1
Entering edit mode

Since the program produces output that has identical file names each time you have no option but to segregate output of different samlpes into different directories.

ADD REPLY
0
Entering edit mode

You already know how to incorporate a variable part and a constant part when specifying a file name. It's the same.

If you did this by copy-pasting what someone else did, all I can say is you are going to struggle mightily if you can't learn how to understand what you are doing, so you can adapt it. No one wants to write your custom command lines for you. You have to learn the principals for yourself.

ADD REPLY

Login before adding your answer.

Traffic: 2676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6