Total No of Genes of GENCODE Release 43
1
0
Entering edit mode
12 months ago
ahmad • 0

Hi everyone

This is the my first post and currently am learning Bioconductor

Does anyone know why there is a different between total number of genes for GENCODE Release 43 which is 62703 (https://www.gencodegenes.org/human/stats_43.html) and number of genes once extracted using Bioconductor as the following code

gtf_43<-rtracklayer::import('gencode.v43.primary_assembly.annotation.gtf') 
dtgtf_44<-data.frame(gtf_43)
genes <- unique(obj_43[ ,c("gene_id","gene_name")]) 
nrow(genes)

The result is 62757

Thank you in advance

gencode gene • 906 views
ADD COMMENT
0
Entering edit mode

obj_43 is not defined in this code chunk which decreases my confidence in the result. Just count the unique gene IDs.

ADD REPLY
0
Entering edit mode

Hello, thanks for replying, by mistake I paste the wrong code. the following is the corrected

gtf_43<-rtracklayer::import('gencode.v43.primary_assembly.annotation.gtf') dtgtf_43<-data.frame(gtf_43) genes <- unique(dtgtf_43[ ,c("gene_id","gene_name")]) geneId <- unique(dtgtf_43[ ,c("gene_id")]) length(geneId) ## result 62757 nrow(genes) ## result

ADD REPLY
0
Entering edit mode

The following is the correct code

gtf_43<-rtracklayer::import('gencode.v43.primary_assembly.annotation.gtf') dtgtf_43<-data.frame(gtf_43) nrow(dtgtf_43) genes <- unique(dtgtf_43[ ,c("gene_id","gene_name")]) geneId <- unique(dtgtf_43[ ,c("gene_id")]) length(geneId) nrow(genes)

ADD REPLY
1
Entering edit mode
12 months ago
ATpoint 81k

As I said, I assume that you are doing something on old/overwritten variables or a wrong GTF. This is how it is:

a<-rtracklayer::import("https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.annotation.gtf.gz")
length(unique(a$gene_id))
[1] 62703

nrow(data.frame(a[a$type=="gene",]))
[1] 62703
ADD COMMENT
0
Entering edit mode

Thank you, its correct. In my code I was using (https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.primary_assembly.annotation.gtf.gz) not ("https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.annotation.gtf.gz")

Based on what I red that gencode.vX.primary_assembly.annotation contains Gene annotation on reference chromosomes and scaffolds, and gencode.vX.annotation contains annotation (genes, transcripts, exons, start_codon, stop_codon, UTRs, CDS) on the reference chromosomes. what does it mean ?and what is the different? and which one shall be use for alignment process?? Thanks in advance

ADD REPLY

Login before adding your answer.

Traffic: 2026 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6