Issues with chromosome file in kallisto
3 months ago
bioinfo ▴ 50

Hello,

I recently updated the indexes I am using for kallisto from version 93 to 96. I tried to create genomebam files and I used the same chromosome file I was using in the past. However, now I am getting the following warning:

Warning: could not find chromosomes for 491 transcripts
Warning: 25039 transcripts were defined in GTF file, but not in the index


Does that mean that I need to find a new chromosome file? Where can I find a recent one? I am interested in GRCm38. The one we had on file is from 2016. I am using the kallisto index and gtf file from the prebuilt indexes in kallisto.

I am using kallisto 0.44.0. I noticed that some people were having that issue with kallisto 0.46.2 (https://github.com/pachterlab/kallisto/issues/254). I really don't want to change the kallisto version unless I absolutely have to. I also noticed that the chromosome file that I have is the same as the one mentioned in the ticket.

Thank you

3 months ago
dsull ★ 4.0k

The GTF files sometimes don't match the FASTA file (e.g. there may be some things in the GTF that aren't in the FASTA and vice versa).

I recommend using the kb ref command in the kb-python package to get a cDNA FASTA and index that resolves this mismatch. I don't recommend using prebuilt indices -- it's generally always better to build your indices.

As for the chromosome problem, this is again due to a mismatch: your GTF file probably has scaffolds that don't appear in your chromosomes file.

In practice, this should shouldn't really make a difference.

But, in any case, this a GTF+FASTA issue, not really a kallisto issue.

Thank you for replying. You said that this should shouldn't really make a difference. Does that mean that I can still use the BAM files that were generated?

Yes, correct.

