Kallisto output No target IDs
1
0
Entering edit mode
3.4 years ago
julie_dddd ▴ 10

I have some Kallisto output, and I want to apply tximport on the abundance.tsv file. However, I don't understand why my output is like this, not having target IDs, while using correctly how the manual prescribed:

kallisto quant -b 10 -i Homo_sapiens_index -o ERX3307771_quant -t 1 --single -l 51 -s 1 -g Homo_sapiens.GRCh38.101.gtf ERX3307771.fastq

Output is without target Ids as desired: https://ibb.co/JtcdBxh

I think we need those target IDs not to be 0 1 2 3... in order to be able to apply tximport.

Does someone know what could be wrong here?

(I'm very new to bioinformatics)

Kindest regards Julie

kallisto RNA-Seq • 1.2k views
ADD COMMENT
0
Entering edit mode

Cross-posted on Bioconductor: https://support.bioconductor.org/p/p132885/

ADD REPLY
0
Entering edit mode
3.4 years ago

The target IDs should have been set as the FASTA header names that you used when running Kallisto, i.e., the FASTA file against which you were pseudo-aligning your reads for the purpose of performing read count abundance. Can you verify what was this FASTA and how are its headers?

Irrespective, with tximport, you can still introduce annotation via the tx2gene parameter, but this may involve some extra work to create this file. See here: https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#kallisto_with_TSV_files

Kevin

ADD COMMENT
0
Entering edit mode

My FASTA looks as follows : https://ibb.co/MgVW1BS

I think I can't just remove the header?

ADD REPLY
0
Entering edit mode

Ah, see, Kallisto stops reading after the first space. This has been reported previously but never answered: https://github.com/pachterlab/kallisto/issues/116

You may want to tidy the headers by removing the spaces or removing that first index number.

You can tidy them in different ways, assuming that the header format is consistent across all entries:

cat test.fasta 
>0 ENST00000456328.2 chr1+ 11869-12227,12613-12721,13221-14409
TTTGGCC
>1 ENST00000456329.2 chr1+ 11869-12227,12613-12721,13221-14409
ATGCATGC
AAA
TTT
>2 ENST00000456330.2 chr1+ 11869-12227,12613-12721,13221-14409
ATGC
ATGC
>3 ENST00000456331.2 chr1+ 11869-12227,12613-12721,13221-14409
ATGC

awk

awk -F " " '/^>/ {print ">"$2}; !/^>/ {print}' test.fasta 
>ENST00000456328.2
TTTGGCC
>ENST00000456329.2
ATGCATGC
AAA
TTT
>ENST00000456330.2
ATGC
ATGC
>ENST00000456331.2
ATGC

sed

sed 's/^>[0-9]* />/g' test.fasta 
>ENST00000456328.2 chr1+ 11869-12227,12613-12721,13221-14409
TTTGGCC
>ENST00000456329.2 chr1+ 11869-12227,12613-12721,13221-14409
ATGCATGC
AAA
TTT
>ENST00000456330.2 chr1+ 11869-12227,12613-12721,13221-14409
ATGC
ATGC
>ENST00000456331.2 chr1+ 11869-12227,12613-12721,13221-14409
ATGC
ADD REPLY

Login before adding your answer.

Traffic: 2749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6