Entering edit mode
10.1 years ago
jon.brate
▴
310
It is easy to define intergenic regions if the gff/gtf file contains a "gene" line (see: http://davetang.org/muse/2013/01/18/defining-genomic-regions/). But I am using a gtf-file generated by cufflinks, and it only uses exon lines. How can I separate the intronic from the intergenic regions based on such a file?
Thanks for the advice!
I can make lists of the start and end positions of each gene by e.g.:
gene.start = lapply(object, start)But I don't quite understand how to get the chromosome names. I tried
lapply(object, seqnames)andseqnames(object)but how to combine with the start and end coordinates?Edit: I found a sligthly different solution here: https://support.bioconductor.org/p/66003/
gtf = makeTxDbFromGFF("mygtf.gtf", format = "gtf") gene = exonsBy(gtf, "gene") intergenic = gaps(unlist(range(gene))) export.gff(intergenic, "intergenic.gff", format="gff")