Hello all,
I would like to have tx2gene
object with 3 columns - the third column should be with gene names like PLA2G4A etc.
The using annotation looks like this:
##description: evidence-based annotation of the human genome (GRCh38), version 32 (Ensembl 98)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2019-09-05
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37102"; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "lncRNA"; transcript_name "DDX11L1-202"; level 2; transcript_support_level "1"; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
Firstly I created tx2gene
object with 2 columns:
> txdb <- makeTxDbFromGFF(file="gencode.v32.annotation.gtf")
> saveDb(x=txdb, file = "gencode.v32.annotation.TxDb")
> k <- keys(txdb, keytype = "TXNAME")
> tx2gene <- select(txdb, k, "GENEID", "TXNAME")
> head(tx2gene)
TXNAME GENEID
1 ENST00000456328.2 ENSG00000223972.5
2 ENST00000450305.2 ENSG00000223972.5
3 ENST00000473358.1 ENSG00000243485.5
4 ENST00000469289.1 ENSG00000243485.5
5 ENST00000607096.1 ENSG00000284332.1
6 ENST00000606857.1 ENSG00000268020.3
But how should I paste the third column with gene names?
There are 2 problems:
1) The main problem - I can't find gene names in the list of columns(txdb)
:
> columns(txdb)
[1] "CDSCHROM" "CDSEND" "CDSID" "CDSNAME" "CDSPHASE" "CDSSTART"
[7] "CDSSTRAND" "EXONCHROM" "EXONEND" "EXONID" "EXONNAME" "EXONRANK"
[13] "EXONSTART" "EXONSTRAND" "GENEID" "TXCHROM" "TXEND" "TXID"
[19] "TXNAME" "TXSTART" "TXSTRAND" "TXTYPE"
2) Even if they were in txdb
, how can I add a third column?
Thank you very much!
Best regards, Poecile
Thank you a lot, I have never heard about this package.