Outputing gexpr() with gene_name (or gene symbol) instead of MSTRG.x gene_id
Entering edit mode
3.8 years ago
ever_wudi ▴ 10

Hi, I am trying to use Ballgown to output gene-sample expression matrix. What I did is geneexp = gexpr(bg), then write.csv(geneexp, "output.csv", row.names = TRUE). However, I could only get it output matrix with MSTRG.x gene ids as identifiers. How can I output the matrix with gene_name (or gene symbol) as identifiers (since MSTRG.x ids really have no use for me)?

Thanks! Di

RNA-Seq Ballgown Stringtie gene_name • 1.5k views
Entering edit mode

Hi did you solve this issue?

Entering edit mode
3.8 years ago
ever_wudi ▴ 10

I figured out one way to do it. I used whole_tx_table = texpr(my.humandata, 'all') to extract everything into whole_tx_table then do final_fpkm_table = whole_tx_table[c("gene_name","sample

1","sample 2", ..)] to slice out only the gene_name and fpkm values, then write final_fpkm_table to a .cvs table. However, one problem I found in the final_fpkm_table.cvs table is that the

gene_names are not unique, there can be many rows for the gene 'Btf3l4' like below. What should I do with these values? Should I take sum, average, or max on the duplicate values to generate unique

gene_name-expression matrix? Also, can EdgeR, FPKM_count.py, or RSEM be used to generate unique gene_name-expression matrix?

Thanks for any advice.

        Sample 1    Sample 2    Sample 3    Sample 4
Btf3l4  7.267802    7.386622    9.815619    9.739746
Btf3l4  0.941536    1.256349    1.365669    1.3953
Btf3l4  0.897259    0.718018    0.025479    0.168297
Btf3l4  0.823937    0.744246    1.132339    1.020087
Btf3l4  0.42134 0.351375    0.236908    0.517893
Btf3l4  1.219011    1.331794    2.030579    1.207322

Login before adding your answer.

Traffic: 1563 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6