Question: (Closed) protein coding gene from a gff3 file
0
gravatar for rajesh
2.3 years ago by
rajesh 60
India/Chandigarh/
rajesh 60 wrote:

i have a GFF3 file of human chromosome. it contain CDS, exon, transcript , gene in its 3rd column, but i want to extract only protein coding genes from this file, which one contain this information. i am attaching the file

 Y  havana  five_prime_UTR  6246223 6246268 .   +   .   Parent=transcript:ENST00000429039
    Y   havana  exon    6246223 6246355 .   +   .   Parent=transcript:ENST00000429039;Name=ENSE00003749301;constitutive=0;ensembl_end_phase=0;ensembl_phase=-1;exon_id=ENSE00003749301;rank=1;version=1
    Y   havana  CDS 6246269 6246355 .   +   0   ID=CDS:ENSP00000414049;Parent=transcript:ENST00000429039;protein_id=ENSP00000414049
    Y   havana  exon    6246617 6246754 .   +   .   Parent=transcript:ENST00000429039;Name=ENSE00001608967;constitutive=0;ensembl_end_phase=0;ensembl_phase=0;exon_id=ENSE00001608967;rank=2;version=1
    Y   havana  CDS 6246617 6246754 .   +   0   ID=CDS:ENSP00000414049;Parent=transcript:ENST00000429039;protein_id=ENSP00000414049
    Y   havana  exon    6247356 6247433 .   +   .   Parent=transcript:ENST00000429039;Name=ENSE00003546295;constitutive=0;ensembl_end_phase=0;ensembl_phase=0;exon_id=ENSE00003546295;rank=3;version=1
    Y   havana  CDS 6247356 6247433 .   +   0   ID=CDS:ENSP00000414049;Parent=transcript:ENST00000429039;protein_id=ENSP00000414049
    Y   havana  exon    6247562 6247673 .   +   .   Parent=transcript:ENST00000429039;Name=ENSE00001623435;constitutive=0;ensembl_end_phase=1;ensembl_phase=0;exon_id=ENSE00001623435;rank=4;version=1
    Y   havana  CDS 6247562 6247673 .   +   0   ID=CDS:ENSP00000414049;Parent=transcript:ENST00000429039;protein_id=ENSP00000414049
    Y   havana  exon    6247775 6247920 .   +   .   Parent=transcript:ENST00000429039;Name=ENSE00001642228;
sequence assembly gene • 1.6k views
ADD COMMENTlink modified 2.3 years ago by WouterDeCoster38k • written 2.3 years ago by rajesh 60

if you want the coding sequences use gffread with -C option. If you want coding gene cordinates grep -i protein_coding input_gff

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Prasad1.5k

Hello rajesh !

We believe that this post does not fit the main topic of this site.

OP does not respond to posts after asking questions. Question closed until OP responds.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 16 months ago by RamRS21k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1150 users visited in the last hour