Question: using tximport to prepare salmon quantification files for DEG analysis
0
gravatar for tarek.mohamed
10 months ago by
tarek.mohamed250
tarek.mohamed250 wrote:

Hi All,

I am using tximport to prepare quant.sf files generated from salmon for Deseq2 DEG analysis. My quant.sf and txgenes files looks fine. However I got a message telling me that I have 3468transcripts missing from tx2gene. I attached the code for tx2gene object generation. What is causing this?

library(biomaRt)
listMarts()
ensembl<-useMart("ensembl") 
listDatasets(ensembl) 
ensembl<-useDataset("hsapiens_gene_ensembl",mart=ensembl)
listFilters(ensembl)
ids<-read.delim(files[1],header = T,sep = "\t")
head(ids)
ids <- as.character(ids[,1])
ids
output<-getBM(filters="ensembl_transcript_id",
              attributes=c("ensembl_transcript_id","external_gene_name"),
              values=ids,mart=ensembl)

txi_test <- tximport("quant.sf", type="salmon",
                      tx2gene=output, dropInfReps=TRUE)

reading in files with read_tsv

1

transcripts missing from tx2gene: 3468

summarizing abundance

summarizing counts

summarizing length

#

 head(tnnt2neo_3uMdox_3uMdesp_rep2)
             Name Length EffectiveLength TPM NumReads
    1 ENST00000434970      9               2   0        0
    2 ENST00000448914     13               3   0        0
    3 ENST00000415118      8               2   0        0
    4 ENST00000632684     12               3   0        0
    5 ENST00000631435     12               3   0        0
    6 ENST00000430425     17               3   0        0

#

head(output)
  ensembl_transcript_id external_gene_name
1       ENST00000337114             PKD1L2
2       ENST00000358097             CYP2D7
3       ENST00000375726             CASP12
4       ENST00000377032          IGKV1D-12
5       ENST00000377712              NAT8B
6       ENST00000379435        TRBV20OR9-2
rna-seq tximport • 786 views
ADD COMMENTlink modified 10 months ago by h.mon24k • written 10 months ago by tarek.mohamed250

You may use setdiff() to investigate the missing entries:

setdiff( output$ensembl_transcript_id, tnnt2neo_3uMdox_3uMdesp_rep2$Name )

setdiff( tnnt2neo_3uMdox_3uMdesp_rep2$Name, output$ensembl_transcript_id )

Or use [ , 1 ], in case $ fails:

setdiff( output[ , 1 ], tnnt2neo_3uMdox_3uMdesp_rep2[ , 1 ] )

setdiff( tnnt2neo_3uMdox_3uMdesp_rep2[ , 1 ], output[ , 1 ] )

This would tell the missing transcripts, but not why they are missing. I guess the cause lies in how you created the tx2gene output table - which you didn't tell us.

EDIT: am I going crazy and I didn't see the getBM() code, or did you add it after my comment?

Anyway, what is the version of your reference transcriptome? How did you create it? Did you check the ids of the missing transcripts?

ADD REPLYlink modified 10 months ago • written 10 months ago by h.mon24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1340 users visited in the last hour