Kallisto abundance.tsv
4 months ago
sansan_96 ▴ 40

Hello, I am running Kallisto for the first time and I am wondering if I am executing the correct command. I am adding six samples but in the end it only generates an abundance.tsv file that does not contain columns corresponding to the samples I entered. Is this normal? Or should I also get six .tsv files?

My code:

    user:~/project_2023/kallisto_analysis/kallisto_quantification$ kallisto quant -i ../kallisto_index/transcripts.idx -o output --single -l 200 -s 20 ../../trimming_data/*.fastq.gz

[quant] fragment length distribution is truncated gaussian with mean = 200, sd = 20
[index] k-mer length: 31
[index] number of targets: 72,539
[index] number of k-mers: 59,116,432
[index] number of equivalence classes: 202,726
[quant] running in single-end mode
[quant] will process file 1: ../../trimming_data/SRR22164928_T.fastq.gz
[quant] will process file 2: ../../trimming_data/SRR22164929.fastq.gz
[quant] will process file 3: ../../trimming_data/SRR22164930.fastq.gz
[quant] will process file 4: ../../trimming_data/SRR22164931.fastq.gz
[quant] will process file 5: ../../trimming_data/SRR22164932.fastq.gz
[quant] will process file 6: ../../trimming_data/SRR22164933.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 573,962,680 reads, 486,416,913 reads pseudoaligned
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1,579 rounds

The output is like this (abundance.tsv):

user:~/project_2023/trimming_data/output$ head  abundance.tsv

enter image description here

4 months ago

There will be a single abundance.tsv file per each sample that you process. To then import and normalise these for downstream interpretation, I encourage you to take a look at the DESeq2 vignette, here: https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#quick-start

As the analyst, the onus is on you to take this project, forward.

A bientot,


Thanks, I understand that I will have to get six files at the end of the process, but I only get one even though I input six fastq files.

The six samples are pooled together -- it's like concatenating the six FASTQ files into one.

I guess the confusion is because of this method of specifying fastq files on the input. ../../trimming_data/*.fastq.gz that OP used.

ibq.enriquepola kallisto manual says:

only supply one sample at a time to kallisto. The multiple FASTQ (pair) option is for users who have samples that span multiple FASTQ files.

Since you have independent SRR numbers I assume you have multiple samples and thus you were expecting multiple abundance files.


