Question: How To Convert Fasta To Gff?
gravatar for Melissa
9.4 years ago by
Melissa0 wrote:

Hi, I'm trying to do some annotations on a novel genome, how do I convert a fasta file to gff3 file? or must I go fasta->BED->gff? (in that case how do I convert fasta to BED?)


fasta gff • 18k views
ADD COMMENTlink written 9.4 years ago by Melissa0

I think the short answer is: you can't, for the reason Steve describes, unless the FASTA header description contains the required information (coordinates and strand).

ADD REPLYlink written 9.4 years ago by Neilfws49k

I'm confused: a FASTA file has sequence information, and a bed or gff file has coordinate info (with annotation stuff optionally added). Can you give an example of say, 1 or 2 records from a FASTA file and show us how the corresponding GFF file you want would look like?

ADD REPLYlink written 9.4 years ago by Steve Lianoglou5.1k

I'm guessing that the FASTA file contains the sequence of a feature (say, mRNA) and that she wants to annotate that feature on the genome using a GFF file. So she'll have to align the FASTA file first, then convert the alignment file to BED in some manner.

ADD REPLYlink written 9.3 years ago by Eric Fournier1.4k

I get the question,too. When I want to use MCscanX, I need a gff file, while I do not know how to get it. Since I'm a Chinese with poor English, I even donot know how to ask a question Hope someone can help me, thank you!

ADD REPLYlink written 4.9 years ago by lo62900
gravatar for Scott Cain
9.3 years ago by
Scott Cain750
Scott Cain750 wrote:

So the question is, what do you want to do with this data? I can think of one reason why you might want to create a GFF out of a fasta file: loading EST or cDNA sequences into a Chado database, for example, where you have to specify type and other attribute information during the load, and in fact there is a tool called that comes with Chado. Outside of that use case, I can't think of another reason to do this. If you describe what you want to do in more detail, you might get a better answer.

ADD COMMENTlink written 9.3 years ago by Scott Cain750
gravatar for Lee Katz
9.3 years ago by
Lee Katz3.1k
Atlanta, GA
Lee Katz3.1k wrote:

You can't strictly convert fasta to gff because fasta contains sequence information and gff contains location information. However, you can try to find the location information from the defline. If you fasta looks like this,

>CDS_0001 start=start stop=stop contig=contig strand=+ ...

Then you can try to make a GFF file by parsing the defline

print join("\t",$contig,"FASTAparser","CDS",$start,$stop,'.','+','.',$attributes)."\n";

At the very least, you will need contig/chromosome, start, and stop information.

Details on how to properly format GFF can be found on this page: (coincidentally a GMOD webpage, which is what Scott Cain works on!)

ADD COMMENTlink written 9.3 years ago by Lee Katz3.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 940 users visited in the last hour