Entering edit mode
3 hours ago
LGG
•
0
Hi,
I'm trying to do a DTU analysis in R on my RNA-seq data. I ran Salmon in mapping mode, using Homo_sapiens.GRCh37.cdna.all.fa to create the index.
Now I want to run tximport, but I keep getting this error:
Error in medianLengthOverIsoform(length4CFA, tx2gene, ignoreTxVersion, :
all(txId %in% tx2gene$tx) is not TRUE
Calls: tximport -> medianLengthOverIsoform -> stopifnot
Execution halted
Here's my code:
library(GenomicFeatures)
library(tximport)
gtf <- "../Homo_sapiens.GRCh37.87.gtf.gz"
txdb <- makeTxDbFromGFF(gtf)
k <- keys(txdb, keytype="TXNAME")
tx2gene <- AnnotationDbi::select(txdb, keys=k, columns=c("TXNAME", "GENEID"), keytype="TXNAME")
colnames(tx2gene) <- c("tx","gene")
salmon_files <- list.files("/path/to/files")
samples <- data.frame(
sample = salmon_files,
path = sprintf("/path/to/files/%s", salmon_files)
)
# Import transcript-level Salmon quantifications
# Build files vector: for tximport, point to quant.sf file paths
files <- file.path(samples$path, "quant.sf")
names(files) <- samples$sample
txi <- tximport(files, type="salmon", txOut = TRUE, tx2gene = tx2gene, countsFromAbundance = "dtuScaledTPM", ignoreTxVersion=TRUE)
Here is the tx2gene file and one of my quant.sf files:
> head(tx2gene)
tx gene
1 ENST00000456328 ENSG00000223972
2 ENST00000515242 ENSG00000223972
3 ENST00000518655 ENSG00000223972
4 ENST00000450305 ENSG00000223972
5 ENST00000473358 ENSG00000243485
6 ENST00000469289 ENSG00000243485
> head(quant.sf)
Name Length EffectiveLength TPM NumReads
1 ENST00000415118.1 8 8 0 0
2 ENST00000434970.2 9 9 0 0
3 ENST00000448914.1 13 13 0 0
4 ENST00000604642.1 23 23 0 0
5 ENST00000603326.1 19 19 0 0
6 ENST00000604950.1 31 31 0 0
If anyone could help me figure this out I would greatly appreciate it :)