Question: modify gff file
0
gravatar for manishbiotechie
14 months ago by
manishbiotechie0 wrote:

Hi I want to modify my GFF file

Right now it is in form

Ca_LG_1 EVM gene    41278   42503   .   -   .   ID=Ca_00005;
Ca_LG_1 EVM mRNA    41278   42503   .   -   .   ID=Ca_00005.1;Parent=Ca_00005;
Ca_LG_1 EVM exon    42292   42503   .   -   .   ID=Ca_00005.1.exon1;Parent=Ca_00005.1;
Ca_LG_1 EVM CDS 42292   42503   .   -   0   ID=Ca_00005.1.cds1;Parent=Ca_00005.1;
Ca_LG_1 EVM exon    41379   41745   .   -   .   ID=Ca_00005.1.exon2;Parent=Ca_00005.1;
Ca_LG_1 EVM CDS 41379   41745   .   -   2   ID=Ca_00005.1.cds2;Parent=Ca_00005.1;
Ca_LG_1 EVM exon    41278   41304   .   -   .   ID=Ca_00005.1.exon3;Parent=Ca_00005.1;
Ca_LG_1 EVM CDS 41278   41304   .   -   0   ID=Ca_00005.1.cds3;Parent=Ca_00005.1;
Ca_LG_1 EVM gene    71881   72641   .   +   .   ID=Ca_00006;
Ca_LG_1 EVM mRNA    71881   72641   .   +   .   ID=Ca_00006.1;Parent=Ca_00006;
Ca_LG_1 EVM five_prime_UTR  71881   71905   .   +   .   ID=Ca_00006.1.utr5p1;Parent=Ca_00006.1;
Ca_LG_1 EVM exon    71881   72641   .   +   .   ID=Ca_00006.1.exon1;Parent=Ca_00006.1;
Ca_LG_1 EVM CDS 71906   72481   .   +   0   ID=Ca_00006.1.cds1;Parent=Ca_00006.1;
Ca_LG_1 EVM three_prime_UTR 72482   72641   .   +   .   ID=Ca_00006.1.utr3p1;Parent=Ca_00006.1;
Ca_LG_1 EVM gene    73915   74216   .   -   .   ID=Ca_00007;
Ca_LG_1 EVM mRNA    73915   74216   .   -   .   ID=Ca_00007.1;Parent=Ca_00007;
Ca_LG_1 EVM exon    74113   74216   .   -   .   ID=Ca_00007.1.exon1;Parent=Ca_00007.1;
Ca_LG_1 EVM CDS 74113   74216   .   -   0   ID=Ca_00007.1.cds1;Parent=Ca_00007.1;
Ca_LG_1 EVM exon    73915   74008   .   -   .   ID=Ca_00007.1.exon2;Parent=Ca_00007.1;
Ca_LG_1 EVM CDS 73915   74008   .   -   2   ID=Ca_00007.1.cds2;Parent=Ca_00007.1;

and I want

1   araport11   gene    3631    5899    .   +   .   gene_id "AT1G01010"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding";
1   araport11   transcript  3631    5899    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding";
1   araport11   exon    3631    3913    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon1";
1   araport11   CDS 3760    3913    .   +   0   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; protein_id "AT1G01010.1"; protein_version "1";
1   araport11   start_codon 3760    3762    .   +   0   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding";
1   araport11   exon    3996    4276    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "2"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon2";
1   araport11   CDS 3996    4276    .   +   2   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "2"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; protein_id "AT1G01010.1"; protein_version "1";
1   araport11   exon    4486    4605    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "3"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon3";
1   araport11   CDS 4486    4605    .   +   0   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "3"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; protein_id "AT1G01010.1"; protein_version "1";
1   araport11   exon    4706    5095    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "4"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon4";
1   araport11   CDS 4706    5095    .   +   0   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "4"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; protein_id "AT1G01010.1"; protein_version "1";
1   araport11   exon    5174    5326    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "5"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon5";

in this format

I am new to bioinformatics kindly help

genome • 466 views
ADD COMMENTlink modified 14 months ago by finswimmer12k • written 14 months ago by manishbiotechie0

Hello manishbiotechie,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

fin swimmer

ADD REPLYlink written 14 months ago by finswimmer12k

For future reference, please try to use more descriptive titles (e.g. the nature of the “modification” you want to make and so on).

ADD REPLYlink written 14 months ago by Joe14k
2
gravatar for finswimmer
14 months ago by
finswimmer12k
Germany
finswimmer12k wrote:

Hello manishbiotechie,

it looks like you're trying to convert gff to gtf. To get the exact output you gave in your example there are not enough information in your gff.

You can try gffread for conversion:

$ gffread input.gff -T -o output.gtf

This will give you this output:

Ca_LG_1 EVM exon    41278   41304   .   -   .   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM exon    41379   41745   .   -   .   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM exon    42292   42503   .   -   .   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM CDS 41278   41304   .   -   0   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM CDS 41379   41745   .   -   1   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM CDS 42292   42503   .   -   0   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM exon    71881   72641   .   +   .   transcript_id "Ca_00006.1"; gene_id "Ca_00006";
Ca_LG_1 EVM CDS 71906   72481   .   +   0   transcript_id "Ca_00006.1"; gene_id "Ca_00006";
Ca_LG_1 EVM exon    73915   74008   .   -   .   transcript_id "Ca_00007.1"; gene_id "Ca_00007";
Ca_LG_1 EVM exon    74113   74216   .   -   .   transcript_id "Ca_00007.1"; gene_id "Ca_00007";
Ca_LG_1 EVM CDS 73915   74008   .   -   1   transcript_id "Ca_00007.1"; gene_id "Ca_00007";
Ca_LG_1 EVM CDS 74113   74216   .   -   0   transcript_id "Ca_00007.1"; gene_id "Ca_00007";

Otherwise have a look at the place where you get your gff if there is also an gtf available.

fin swimmer

ADD COMMENTlink modified 14 months ago • written 14 months ago by finswimmer12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2015 users visited in the last hour