Hi,
I have a batch of GFF files for bacteria genomes, which have been edited from the original GFF file from NCBI ftp site. In the edited GFF, I have inserted sRNA features. This is an example (dummy example) to illustrate the edited GFF:
##gff-version 3 #!gff-spec-version 1.20 #!processor NCBI annotwriter ##sequence-region NC_016077.1 1 2487765 ##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=568816 NC_016077.1 RefSeq region 1 2487765 . + . ID=id0;Dbxref=taxon:568816;Is_circular=true;gbkey=Src;genome=chromosome;mol_type=genomic DNA;old-name=Acidaminococcus intestinalis RYC-MR95;strain=RyC-MR95 NC_016077.1 RefSeq CDS 175 1551 . + 0 ID=cds0;Name=YP_004895395.1;Parent=gene0;Note=COG0593;Dbxref=Genbank:YP_004895395.1,GeneID:11266123;gbkey=CDS;product=chromosomal replication initiator protein;protein_id=YP_004895395.1;transl_table=11 NC_016077.1 RefSeq CDS 1626 1778 . + 0 ID=cds1;Name=YP_004895396.1;Parent=gene1;Dbxref=Genbank:YP_004895396.1,GeneID:11264393;gbkey=CDS;product=hypothetical protein;protein_id=YP_004895396.1;transl_table=11 NC_016077.1 RefSeq sRNA 1799 1825 . + 0 ID=sRNA1;Name=sRNA1;Parent=gene2;Dbxref=Genbank:YP_004895397.1,GeneID:11264394;gbkey=sRNA;product=sRNA 1;protein_id=YP_004895397.1;transl_table=11 NC_016077.1 RefSeq CDS 1829 2947 . + 0 ID=cds2;Name=YP_004895397.1;Parent=gene2;Dbxref=Genbank:YP_004895397.1,GeneID:11264394;gbkey=CDS;product=DNA polymerase III;protein_id=YP_004895397.1;transl_table=11 NC_016077.1 RefSeq CDS 2953 4101 . + 0 ID=cds3;Name=YP_004895398.1;Parent=gene3;Dbxref=Genbank:YP_004895398.1,GeneID:11264395;gbkey=CDS;product=recombination protein F;protein_id=YP_004895398.1;transl_table=11
So the bold one is sRNA. I want to extract a gene before and a gene after the sRNA sequence, in this case they are cds1 and cds2. The output should be in tab delimited format so that I could extract the fasta sequence then. In a GFF, there are multiple sRNAs to extract and I have many GFF files to perform this task.
Anyone mind to to help?
Regards,