Question: Discrepancy between abundance.tsv and tx2gene.csv
0
gravatar for Mozart
7 months ago by
Mozart40
Mozart40 wrote:

So I am testing the Kallisto/DESeq2 pipeline and I am now struggling with tximport as I need to manage the tables obtained in the analysis carried out so far prior to launch DESeq2. For each sample I have an abundance.tsv file and I need to combine(?) it with the .csv file that I created ad hoc (with known genes/transcript correlations). So far, there's a sort of discrepancy with the annotation process as for example in my abundance file I have something like this:

ENSMUST00000103493.2

but I would like to obtain something like this

ENSMUST00000103493

in order to be recognised in my transcript2gene.csv file.

Here's my strings of code:

dir <- system.file("extdata", package = "tximportData")
list.files(dir)
samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)
library(GenomicFeatures)

txdb <-txdb <- select(org.Mm.eg.db, keys(org.Mm.eg.db), "ACCNUM") 
txdb
k <- keys(txdb, keytype = "GENEID")
k
df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
df

'select()' returned 1:many mapping between keys and columns

tx2gene <- df[, 2:1]
head(tx2gene)

#  TXNAME             GENEID
#1 ENSMUST00000000001 ENSMUSG00000000001
#2 ENSMUST00000000003 ENSMUSG00000000003
#3 ENSMUST00000114041 ENSMUSG00000000003
#4 ENSMUST00000000028 ENSMUSG00000000028
#5 ENSMUST00000096990 ENSMUSG00000000028
#6 ENSMUST00000115585 ENSMUSG00000000028

then I write the results as a csv file

write.csv(tx2gene, file = "/tx2gene.csv")

files <- file.path(dir, "kallisto", samples$run, "abundance.tsv")
names(files) <- paste0("sample", 1:6)
txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene)
head(txi.kallisto.tsv$counts)

Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 2 3 4 5 6 
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : 

  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

Any useful hints?

rna-seq • 489 views
ADD COMMENTlink modified 7 months ago by erwan.scaon540 • written 7 months ago by Mozart40
2
gravatar for erwan.scaon
7 months ago by
erwan.scaon540
Limoges - CBRS - France
erwan.scaon540 wrote:

If you want to convert ENSMUST00000103493.2 -> ENSMUST00000103493 in your Kallisto abundance.tsv files, you can do the following :

for f in *.tsv;
do awk -F '\t' -v OFS='\t' 'NR > 1 {sub(/\.[0-9]*/, "", $1)} 1' $f > ${f%%.*}"_renamed.tsv";
done;
ADD COMMENTlink written 7 months ago by erwan.scaon540

That's perfect. I solved my problem, thank you!

ADD REPLYlink written 7 months ago by Mozart40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 994 users visited in the last hour