I aligned my samples with kallisto to a transcriptome for plasmodium falciparum. The file I used to make the reference is Plasmodium_falciparum.ASM276v2.cdna.all.fa.gz which I downloaded from here http://ftp.ensemblgenomes.org/pub/protists/release55/fasta/plasmodium_falciparum/cdna/Plasmodium_falciparum.ASM276v2.cdna.all.fa.gz.
However, I am having issues with tximport.
The error that I get is:
Error in .local(object, ...) : None of the transcripts in the quantification files are present in the first column of tx2gene. Check to see that you are using the same annotation for both. Example IDs (file): [CAX64123, CAX64256, CZT99967, ...] Example IDs (tx2gene): [CAD49011., CAD48976., CAD49073., ...] This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.
I understand that the problem seems to be on the mart object I created and that maybe I am getting a different version. However, I think that the problem is the external gene name. I see on the mart object that it is an attribute but when I add it to t2g the column is empty. Has anyone had that issue before?
My script is below:
mart <- biomaRt::useMart("protists_mart", host= "https://protists.ensembl.org", "pfalciparum_eg_gene") t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id", "external_gene_name"), mart = mart) t2g <- dplyr::rename( t2g, gene_symbol = external_gene_name) t2g<-t2g[,c(ncol(t2g),1:(ncol(t2g)-1))] accessions <- list.dirs(full.names=FALSE)[-1] kallisto.dir<-paste0(accessions) tsv_files<-file.path(kallisto.dir,"abundance.tsv") #can also be abundance.tsv names(kallisto.files)<- accessions tx.kallisto <- tximport(kallisto.files, type = "kallisto", tx2gene = t2g)
Thank you. It does not seem to be that. I changed my script a bit. Now it looks like shown below:
If I use the script as shown above then for the counts in the tx.kallisto object I just get one number. If I comment the second line out and use the 3rd line for the getBM attributes I do get a file with the ensembl gene IDs. It seems to be something with the external gene name causing the problem.
You do not do what I suggested above.
I figured out that most of the external gene name column was empty so that is why it was not working. I ended up just using the ensembl_gene_id and it works fine now. Thank you