Filter genes of interest from GTF file on R
1
0
Entering edit mode
23 months ago

Hello everyone. I'm trying to filter the lines from a GTF file of online the 100 genes I'm interested in using R. The GTF file is already in a dataframe format with one column with the gene_id of each line and the file with the genes of interest also already has only one column with the gene_id of said genes. I've tried to use filter (package: dplyr), but I get the mesage:

> gtf_filtered <- filter(gtf2, gene == top3)
Error in `filter()`:
! Problem while computing `..1 = gene == top3`.
x Input `..1` must be of size 1326608 or 1, not size 100.
Run `rlang::last_error()` to see where the error occurred.

Is there a way to solve this or another package/function I can use to filter the file? Using one gene_id per time doesn't cause any problem (ex:

> gtf_filter_10 <- filter(gtf2, gene == 'gene_id PRX4')   )

Thanks in advance

Filter Gene R • 2.0k views
ADD COMMENT
0
Entering edit mode

Thank you for describing the data frame in detail. Please post some data and expected output or dput in R.

ADD REPLY
0
Entering edit mode

The data is just a normal gtf file with 9 columns (see http://www.ensembl.org/info/website/upload/gff.html for gtf file details). The list with the genes of interest has just one column with 100 lines with the gene_ids (like 'gene_id PRX4'). The expected output is the same as the original gtf, but with only the wanted genes.

ADD REPLY
0
Entering edit mode

Thank you for your data explanation again.

ADD REPLY
0
Entering edit mode
23 months ago
Chirag Parsania ★ 2.0k
gff_file <- "path/to/gff/file"

gff_annot <- rtracklayer::import(gff_file) %>%
    as.data.frame() %>%
    tibble::as_tibble() 

load gene file

gene_file <- "path/to/gene/file"
genes_to_filter <- readr::read_delim(gene_file , delim = "\t") %>% dplyr::pull(1)

filter gff for desired genes

gff_filtered <- gff_annot %>%
    dplyr::filter(gene_id %in% genes_to_filter)  # gene_id is the column name present in the dataframe gff_annot. You can change this as per the column name in your gff file. 
ADD COMMENT

Login before adding your answer.

Traffic: 1981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6