How to get gene coding sequences based on GFF file?
1
2
Entering edit mode
8.9 years ago
biolab ★ 1.4k

Hi everyone,

I have a draft genome fasta file and a GFF annotation file. The GFF file is like below.

9311_chr12      GLEAN   mRNA    17901210        17902763        0.90124 +       .       ID=9311_GLEAN_10008559;
9311_chr12      GLEAN   CDS     17901210        17901318        .       +       0       Parent=9311_GLEAN_10008559;
9311_chr12      GLEAN   CDS     17901418        17901486        .       +       2       Parent=9311_GLEAN_10008559;
9311_chr12      GLEAN   CDS     17901566        17901672        .       +       2       Parent=9311_GLEAN_10008559;
9311_chr12      GLEAN   CDS     17901722        17901755        .       +       0       Parent=9311_GLEAN_10008559;
9311_chr12      GLEAN   CDS     17902585        17902763        .       +       2       Parent=9311_GLEAN_10008559;
9311_chr04      GLEAN   mRNA    22207209        22208012        0.999282        -       .       ID=9311_GLEAN_10029041;
9311_chr04      GLEAN   CDS     22207209        22208012        .       -       0       Parent=9311_GLEAN_10029041;

My purpose is to get the gene coding sequences (without UTRs). I can filter the GFF file to include the CDS tracks only, but how to achieve the next step, that is to get the CDS sequences? Thank you very much!

perl python gff • 7.6k views
ADD COMMENT
0
Entering edit mode

Sorry for re-posting. I have found the solution on Biostars Extract Cds Fastas From A Gff Annotation + Reference Sequence

Thanks for your attention.

ADD REPLY
2
Entering edit mode
8.9 years ago
PoGibas 5.1k

Reformat your GFF into CDS.bed and send it to bedtools getfasta.

# Not tested
awk '($3=="CDS") {OFS="\t"; print $1,$4,$5}' annotation.gff | 
    bedtools getfasta -fi genome.fa -bed - -fo CDS.fa
ADD COMMENT
0
Entering edit mode

Thank you Pgibas, your comment is very helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1484 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6