Question

Kallisto pseudobam to IGV

2

Entering edit mode

6.7 years ago

mac ▴ 70

Hello,

I am analysing some RNA-seq data with kallisto and I have created a series of sam files using --pseudobam in kallisto.

Is there a way of converting these files in a format that can be easily viewed on IGV?

Thank you in advance! :)

RNA-Seq kallisto pseudobam IGV alignment • 7.4k views

ADD COMMENT • link updated 6.2 years ago by mmfansler ▴ 450 • written 6.7 years ago by mac ▴ 70

0

Entering edit mode

In addition..

Is that bam as accurate and useful than those obtained with splicing aware mappers like STAR o HISAT ?

ADD REPLY • link 6.7 years ago by Antonio R. Franco ★ 5.1k

1

Entering edit mode

Not really, no. As the pseudo alignment will only assign transcriptomic reads that it can definitively say belong to a given transcript, and the abundance estimation then uses an EM (expectation maximisation) function to basically fill in the gaps.

ADD REPLY • link 6.7 years ago by andrew.j.skelton73 6.5k

score 2 · Answer 1 · 2017-07-19

2

Entering edit mode

6.7 years ago

andrew.j.skelton73 6.5k

Per @Devon's answer here, you can use samtools to sort your SAM and convert it to BAM.

samtools view -u alignment.sam | samtools sort -@ 4 - output_prefix

Edit: The question is referring to transcriptomic vs genomic mappings for IGV

There's a python script available here found in this answer from SeqAnswers.

Usage:

python3 kallisto_sam_convertor.py pseudoalignment.sam annotation.gtf | samtools view -bS - | samtools sort - -o <output.bam>

bam should be loadable to IGV.

ADD COMMENT • link 6.7 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

Thank you for your reply but my understanding is that the kallisto psedobam (which is actually sam) have coordinates based on the transcriptome used to make the index.

How can I convert these to genomic coordinates so I can make an indexed bam to view in IGV?

ADD REPLY • link 6.7 years ago by mac ▴ 70

0

Entering edit mode

I've updated my answer

ADD REPLY • link 6.7 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

Thank you, I will try this :)

ADD REPLY • link 6.7 years ago by mac ▴ 70

score 2 · Answer 2 · 2018-02-17

Use `--genomebam`

Introduced in kallisto v0.44.0 is a --genomebam flag, which has quant generate a genome aligned BAM. You must provide an annotation GTF file via the --gtf flag. It is also recommended to include a chrom.sizes file with the --chromsomes flag.

No reads showing?

It might be worth mentioning that I wasn't able to immediately use GENCODE's files for mm10, e.g., gene GTF and transcripts FASTA. The kallisto indexer interpreted the full line from the FASTA read headers as the transcript IDs, but since the headers include much more, the pseduobam procedure couldn't resolve them to a transcript ID in the GTF. It still generated the full BAM, but everything was aligned to "*". Reprocessing the FASTA to only include the transcript ID (everything up to first "|") and reindexing on that got everything working.

Another issue I've come across is a mismatch between the chromosome names in the GTF and those in the chrom.sizes file. Kallisto will throw a warning about not finding chromosomes for transcripts if this is the case. A simple test to see if this is the issue is to run without a --chromosomes flag, in which case, kallisto defaults to simply using the chromosome names in the GTF.

Troubleshooting Custom Transcriptomes/GTF

If you are generating your own GTF, it might be worth having a look at some compatibility guidelines I posted over on the kallisto repository. Simple sorting by seqname + start is not sufficient, e.g. exons must be rank-ordered.

score 0 · Answer 3 · 2017-07-28

0

Entering edit mode

6.7 years ago

wajm97 • 0

Is this working properly? pseudobam and the script are not working for me.

ADD COMMENT • link 6.7 years ago by wajm97 • 0

0

Entering edit mode

You'll have to give more detail than that, or contact the author of the script.

ADD REPLY • link 6.7 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

I thought kallisto_sam_convertor.py could add gene annotation and somemore information to kallisto pseudobam output. enter image description here

I got this message.

ADD REPLY • link 6.7 years ago by wajm97 • 0

0

Entering edit mode

and test another GTF file. enter image description here

ADD REPLY • link 6.7 years ago by wajm97 • 0

2

Entering edit mode

Hi, the author is here. I wrote this script for one use only (I was not trying to make a very general resource for everybody, but maybe I should).

The problem is that you got differently formatted gtf file and I am using instead of some implemented parsers simply my own. These are lines parsing a gtf file into two dictionaries of transcript pointing to their position in genome (one dictionary for + strand and one for -):

pos_tr = {}
neg_tr = {}
with open(sys.argv[2],'r') as anotationfile:
    for entry in anotationfile:
        entry = entry.split('\t')
        key = entry[8].split('"')[1]
        if entry[6] == '+':
            pos_tr[key] = entry[3]
        else:
            neg_tr[key] = entry[3]

The quick solution would be just to find how to read the name of transcript and replace entry[8].split('"')[1] by by an expression what will get a transcript name for your version of gtf.

I also found that there are two bits, that will rename your reference to S5_genome. lines 21 - 23 :

if samline[:3] == '@HD':
        print(samline, end='')
                    print("@SQ  SN:S5_genome    LN:6778833")

and line 30

print('S5_genome', end="\t")

This was not intended as general utility, but a solution for a project during genomics course where we wanted to be able to inspect the job of kallisto (that's why the repo is called after the course : sequence a genome).

ADD REPLY • link 6.7 years ago by kamiljaron ▴ 220

Use --genomebam

No reads showing?

Troubleshooting Custom Transcriptomes/GTF

Use `--genomebam`