Question

How to use kallisto correctly?

0

Entering edit mode

21 months ago

pavelasquezv ▴ 50

Hi all,

I hope you are well!

I am just starting to use Kallisto to get TPM values and I have some questions:

I need to put the -l and -s values according to the sd of the length of the reads but I don't have that information because I'm mining NCBI data. I read in other posts here that those values were the most common but I can't find information that provides a deeper reason. Do you have any suggestion?
I would like to have the TPM results for each gene. However, the results appear with another format. I can't find a table that has the two values (LOCXXXXXX and NW_XXXXXX) to perform an inner join. Do you have any suggestions, please?
How can I change the TPM name in the output table to the SRA code SRRXXXXX of the input file?

Thank you very much in advance for your collaboration friends!

All the best!

This is the code:

kallisto quant -i k_index -o results --single \
-l 200 -s 30 -t 12 SRR9594314.fastq \
-gtf GCF_002156985.1_Harm_1.0_genomic.gtf

kallisto results

kallisto results

kallisto desirable results

kallisto RNA-seq • 2.1k views

ADD COMMENT • link 21 months ago by pavelasquezv ▴ 50

1

Entering edit mode

What are these target IDs, are you quantifying against chromosomes?

ADD REPLY • link 21 months ago by ATpoint 81k

0

Entering edit mode

Hi ATpoint, many thanks for your reply! I don't know why it appears to me at the chromosome level because I'm interested in counting at the gene level

ADD REPLY • link 21 months ago by pavelasquezv ▴ 50

score 1 · Answer 1 · 2022-07-18

1

Entering edit mode

21 months ago

ATpoint 81k

If you do not have this information then you can use common defaults, such as an average fragment length of 250bp and a standard deviation of 25. Why? Because most RNA-seq library prep kits produce fragment sizes that are somewhat in this range. The fragment size distribution is not exactly a Gaussian but close, with a slight skew. I guess in most situations going with these defaults should be fine.

I am not a kallisto but a salmon user, but anyway, both produce transcript level TPMs, not gene level. Does kallisto has an option to output gene level? If not you may want to use tximport to summarize them to the gene level and then calculate TPMs based on the returned gene level counts, something like Raw counts to TPM in R

ADD COMMENT • link 21 months ago by ATpoint 81k

0

Entering edit mode

Hi ATpoint, Many thanks again! I am not a kallisto user. The manual indicate the following: abundance.tsv is a plaintext file of the abundance estimates. It does not contains bootstrap estimates. Please use the --plaintext mode to output plaintext abundance estimates. Alternatively, kallisto h5dump can be used to output an HDF5 file to plaintext. The first line contains a header for each column, including estimated counts, TPM, effective length. I probably have to use the tool you mention to get the TPM value at the gen level. What interests me for my study

ADD REPLY • link 21 months ago by pavelasquezv ▴ 50

score 1 · Answer 2 · 2022-07-19

1

Entering edit mode

21 months ago

dsull ★ 5.8k

First problem I see: you have targets that are over 3 million bases long. Therefore, you are not doing transcriptome mapping which is what kallisto (and the TPM quantification) is designed for.

ADD COMMENT • link 21 months ago by dsull ★ 5.8k

0

Entering edit mode

Hi dsull, manythanks for your reply \ Yes, you are right. But I don't know why that happens. Maybe I didn't do the genome index correctly. I will verify!

ADD REPLY • link 21 months ago by pavelasquezv ▴ 50