Question: bash loop to python loop for google colabs
Hi there,

I have a bash loop to use kallisto on my sequencing samples, but I'm trying now to run kb on google colabs (all the fastq files are on Google drive), so I'd appreciate a hand in 'translating it to python'. Here's the original script:

FASTQ="/content/drive/My\ Drive/NovaSeq_raw_data/"
for FILE in $(ls *.fastq.gz | rev| cut -c 24- |rev| cut -c 14- |uniq)
echo "kallisto for" $FILE
kallisto quant -i $index.idx -o $FILE -b 100 "${FASTQ}WTCHG_702270_${FILE}_1.fastq.gz" "${FASTQ}WTCHG_702270_${FILE}_2.fastq.gz" "${FASTQ}WTCHG_705748_${FILE}_1.fastq.gz" "${FASTQ}WTCHG_705748_${FILE}_2.fastq.gz"
Untested, but perhaps it might be of use:

#!/usr/bin/env python3

import os
import subprocess

# redefine globals, as needed
index = "12345"
fastq_dir = "/content/drive/My\ Drive/NovaSeq_raw_data/"

# build a set of file keys, truncated per rev-cut-rev-cut-uniq pipeline
files = {}
for file in os.listdir(fastq_dir):
  if file.endswith(".fastq.gz"):
    file = file[13:-23]            # i.e., "rev | cut -c 24 - | rev | cut -c 14 -"
    files[file] = True             # i.e., "uniq"

# walk through set of file keys; call a kallisto subprocess on each 
for file_tuple in files.items():
  file = file_tuple[0]
  print("kallisto for {}".format(file))
  cmd = "kallisto quant -i {}.idx -o {} -b 100 {}WTCHG_702270_{}_1.fastq.gz {}WTCHG_702270_{}_2.fastq.gz {}WTCHG_705748_{}_1.fastq.gz {}WTCHG_705748_{}_2.fastq.gz".format(index, file, fastq_dir, file, fastq_dir, file, fastq_dir, file, fastq_dir, file), shell=True)

Perhaps use print(variable_name) at various parts to see what variables look like, so that you can be sure that things are formatted properly before running the statement.

