I am trying to remove columns from a GTF file so I can perform featureCount analysis using the GTF file.
I am only wanting to keep the columns which contain the geneID, chr, start, end and strand data and discard the rest of the columns. I have managed to do this using R by putting the file in a data frame however I am unsure as to how to convert it back to a GTF file from the data frame:
install.packages('BiocManager')
library(BiocManager)
install()
BiocManager::install("Rsubread")
library(Rsubread)
# opening the gtf file as a dataframe
gtf <- rtracklayer::import('data.gtf')
gtf_df=as.data.frame(gtf)
df = subset(gtf_df, select = -c(seqnames,width,source,type,score,phase,
transcript_id,gbkey,gene_biotype,
locus_tag,old_locus_tag,protein_id,transl_table,
exon_number,gene,Ontology_term,go_component,
go_function,go_process,
anticodon,transcript_biotype,partial,pseudo,
note,db_xref,exception,product,inference))
Is there a way to edit the GTF file using linux? Or a way to make my dataframe from R into a GTF file?
featureCountswill understand a properly formatted GTF file, are you not able to use it as is? What you are trying to create is theSimple Annotation Format (SAF )format file. It will no longer be in GTF format.featureCountscan use SAF format files as well.