I need to know how many coding genes are in the Y chromosome. The first i try to filter gtf file with R using this code
#load gtf file gtf <- rtracklayer::import('~/lapd/Index_hum/ann/Homo_sapiens.GRCh38.104.gtf') gtf_df=as.data.frame(gtf) ##filter gtf file library(dplyr) gtf_filt= filter(gtf_df, type=='gene', gene_biotype == 'protein_coding') chrY=filter(gtf_filt, chromosome_name == 'Y')
Thus, i found 46 coding genes. I thought that i had mistakes and try bioMart:
library(biomaRt) ensembl = useMart("ensembl", dataset= "hsapiens_gene_ensembl") new=getBM(attributes=c("chromosome_name","ensembl_gene_id"), filters='biotype', values=c('protein_coding'), mart=ensembl) chrY=filter(new, chromosome_name == 'Y')
And found 46 coding genes too. When I try to compare a number of protein-coding genes in ensemble statistics and my annotation file from ensemble i found the second trouble. In Ensembl statistics (http://www.ensembl.org/Homo_sapiens/Location/Genome?r=MT) 20,442 coding genes and in annotation files 19,937.
When do I have mistakes? Or its normal)