Question: following Kallisto with DESeq2 using tximport package
gravatar for Assa Yeroslaviz
4.8 years ago by
Assa Yeroslaviz1.4k
Assa Yeroslaviz1.4k wrote:


I am trying to analyse the the results from kallisto with the help of deseq2. After a long search I have found this post. It mentions the package tximport, which I am trying to run now. I have ran the complete vignette without difficulties. But when I am trying to run my data, I get the error message

> txi <- tximport(files[1:3], type="kallisto", tx2gene=tx2gene, reader=read_tsv)
reading in files
1 2 3
transcripts missing genes: 173259
summarizing abundance
Error in split.default(1:nrow(m), f) :
  group length is 0 but data length > 0

The reason for that, is probably that my datais using the Ensembl transcript IDs for the kallisto files, while the tximport workflow assumes that the UCSC IDs are in place.

my files look like that:

target_id    length    eff_length    est_counts    tpm
ENST00000415118    8    2.33333    0    0
ENST00000448914    13    6    0    0
ENST00000434970    9    3.33333    0    0
ENST00000390577    37    12.3793    14    116.948
ENST00000437320    19    10.1667    0    0

while the list of genes from the tximport workflow is :

> head(df)
1      1 uc002qsd.4
2      1 uc002qsf.2
3     10 uc003wyw.1
4    100 uc002xmj.3
5   1000 uc010xbn.1
6   1000 uc002kwg.2

So I was wondering whether there is a better way of working with the package (in the vignette, a separate list with RefSeq Ids is uploded to fit the provided Kallisto files).
Is there another package besides TxDb.Hsapiens.UCSC.hg19.knownGene, where I can map my ENST* IDs to ENSG or even to gene names?

I know I can use biomaRt (this is what I am doing now), but it takes a long time, as my list of transcripts is 173260 rows long.




ADD COMMENTlink modified 4.8 years ago by Michael Love2.1k • written 4.8 years ago by Assa Yeroslaviz1.4k

which version of ensemble (v75 hg19) you are using? I do use have a workaround without using the package.

ADD REPLYlink modified 10 months ago by RamRS30k • written 4.8 years ago by poisonAlien2.8k


I am also interested in a workaround as I m using ENSEMBL and the biomaRt is not working properly. Would you mind sharing your workaround? Im using hg38/ensembl v 83, though. But if you have a workaround for hg19, I can probably change it for my purposes.


ADD REPLYlink written 4.6 years ago by ninninahm50
gravatar for Michael Love
4.8 years ago by
Michael Love2.1k
United States
Michael Love2.1k wrote:

hi Frymor,

A couple things:

tximport will be showing up on Bioconductor next week, so you can ask further questions on the Bioconductor support site (which runs on Biostars interface) and I will be notified to answer them.

You'll need to construct your own tx2gene table. The function needs to be able to group the tx id's to genes, so this requires that the names in target_id column match the names in the first column of tx2gene.

There is a code chunk in the vignette which shows how to build the tx2gene if you have a TxDb (Bioconductor object roughly equivalent to a GTF file).

You can certainly do this with biomaRt (which may be slow) or check out the ensembldb package:

ADD COMMENTlink written 4.8 years ago by Michael Love2.1k


I have tried to use this as it seems very easy with the prepared ensembl 75/79 package, but Im using hg38, unfortunately and am not able to get the openssl installed etc and build my own package with the API as mentioned in the vignette. Do you have any idea how I can get tximport to run when Im using ENsembl and not UCSC and ensembl db is not working as well as biomaRt?

Thanks in advance!

ADD REPLYlink written 4.6 years ago by ninninahm50

Sure, you can import any Ensembl GTF file to build a TxDb. See the makeTxDbFromGFF function in the GenomicFeatures package.

ADD REPLYlink written 4.6 years ago by Michael Love2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1278 users visited in the last hour