Question: Discrepancy between abundance.tsv and tx2gene.csv
gravatar for Mozart
2.6 years ago by
Mozart190 wrote:

So I am testing the Kallisto/DESeq2 pipeline and I am now struggling with tximport as I need to manage the tables obtained in the analysis carried out so far prior to launch DESeq2. For each sample I have an abundance.tsv file and I need to combine(?) it with the .csv file that I created ad hoc (with known genes/transcript correlations). So far, there's a sort of discrepancy with the annotation process as for example in my abundance file I have something like this:


but I would like to obtain something like this


in order to be recognised in my transcript2gene.csv file.

Here's my strings of code:

dir <- system.file("extdata", package = "tximportData")
samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)

txdb <-txdb <- select(, keys(, "ACCNUM") 
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")

'select()' returned 1:many mapping between keys and columns

tx2gene <- df[, 2:1]

#  TXNAME             GENEID
#1 ENSMUST00000000001 ENSMUSG00000000001
#2 ENSMUST00000000003 ENSMUSG00000000003
#3 ENSMUST00000114041 ENSMUSG00000000003
#4 ENSMUST00000000028 ENSMUSG00000000028
#5 ENSMUST00000096990 ENSMUSG00000000028
#6 ENSMUST00000115585 ENSMUSG00000000028

then I write the results as a csv file

write.csv(tx2gene, file = "/tx2gene.csv")

files <- file.path(dir, "kallisto", samples$run, "abundance.tsv")
names(files) <- paste0("sample", 1:6)
txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene)

Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 2 3 4 5 6 
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : 

  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

Any useful hints?

rna-seq • 1.5k views
ADD COMMENTlink modified 2.6 years ago by erwan.scaon790 • written 2.6 years ago by Mozart190
gravatar for erwan.scaon
2.6 years ago by
Nantes - France
erwan.scaon790 wrote:

If you want to convert ENSMUST00000103493.2 -> ENSMUST00000103493 in your Kallisto abundance.tsv files, you can do the following :

for f in *.tsv;
do awk -F '\t' -v OFS='\t' 'NR > 1 {sub(/\.[0-9]*/, "", $1)} 1' $f > ${f%%.*}"_renamed.tsv";
ADD COMMENTlink written 2.6 years ago by erwan.scaon790

That's perfect. I solved my problem, thank you!

ADD REPLYlink written 2.6 years ago by Mozart190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1012 users visited in the last hour