Code for iteratively executing an alignment package (Kallisto) on multiple files in a directory and uniquely creating an output file
1
0
Entering edit mode
8.0 years ago
achamess ▴ 90

Hi,

I'm a novice at RNA-Seq data analysis, and I have a little experience using the command line to do things on my computer.

I'm using Kallisto to align fastq files for RNA-Seq. I know that it is possible to just write a shell script to apply Kallisto to all the fastq files I have in a folder, but I'm not exactly sure how to do this.

In pseudocode, I want to do this:

for [each file] in [directory]; do kallisto quant file_x > file_x_aligned; done

Where I'm stuck is in naming each of the output files uniquely. I'm sure there is an easy way to do this, but it's not coming to me. Sorry for the noob question. Any help would be greatly appreciated.

RNA-Seq • 5.0k views
ADD COMMENT
1
Entering edit mode

See this thread for ideas: bash loop for alignment RNA-seq data @Ram's solution explains how you can grab parts of the sample file name and use those for output.

ADD REPLY
0
Entering edit mode

Thank you. I think I may have made a workable solution.

for file in *.fastq; do kallisto quant -i transcriptome.idx --single -l 300 -s 20 -b 100 -o $file-aligned  "$file"; done
ADD REPLY
0
Entering edit mode

What I find most intuitive (but your solution seems fine): (hypothetical example)

for file in *.fastq
do
outname=$(echo $file | sed 's/.fastq/.kallisto/' ) #various manipulations with sed or tr or cut possible
kallisto quant -i transcriptome.idx --single -l 300 -s 20 -b 100 -o $outname  $file
done
ADD REPLY
0
Entering edit mode
7.7 years ago
lazappi • 0

For anyone that doesn't want to build their own solution I have written a Python script to do this: https://github.com/lazappi/binf-scripts/blob/master/kallistoMulti.py.

ADD COMMENT

Login before adding your answer.

Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6