Question: Can I make GFF3 from either GFF or genome fasta file?
0
gravatar for jaqx008
2.3 years ago by
jaqx00870
jaqx00870 wrote:

Hello all, I am trying to create a densitymap for a TE.bed file I have. I am using chicken-repeats.inra.fr densitymap GUI ( I dont know the details of how it works). the required input file format is gff3 and I have a GFF that looks like this

Bf_V2_1     1   797 -   bf_rep_71           Unknown         Bf_V2_1  
    848 936 +   (TA)n               Simple_repeat   Bf_V2_1  
    1236    1369    -   CR1-11_BF           LINE/CR1        Bf_V2_1  
    2151    2171    +   (TA)n               Simple_repeat   Bf_V2_1  
    2351    3238    -   bf_rep_71           Unknown         Bf_V2_1  
    3229    3413    +   DNA-X-4_BF          DNA/Unknown     Bf_V2_1  
    3400    3506    +   Harbinger-N11_BF    DNA/Harbinger

Is there a way to convert this to GFF3? or can I make GFF3 from fasta? And is there another way to create the densitymap to show locations of my transposable elements in the genome? I have read suggestions to similar question and non has been very helpful. Thanks

ADD COMMENTlink modified 2.3 years ago by Beuss120 • written 2.3 years ago by jaqx00870

Is there a way to convert this to GFF3?

Are you asking if you can convert a .bed file to .gff file?

or can I make GFF3 from fasta?

No, it is not possible to convert/generate a GFF3 file from fasta file. GFF file usually stores annotation data whereas fasta file contains sequences.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Sej Modha4.7k

by is there a way to convert this to GFF3 I mean convert my GFF to GFF3.

ADD REPLYlink written 2.3 years ago by jaqx00870
0
gravatar for Beuss
2.3 years ago by
Beuss120
France
Beuss120 wrote:

I guess the format of example has been broken, so I suppose that your input file is like this:

Bf_V2_1 1   797 -   bf_rep_71   Unknown
Bf_V2_1 848 936 +   (TA)n   Simple_repeat
Bf_V2_1 1236    1369    -   CR1-11_BF   LINE/CR1
Bf_V2_1 2151    2171    +   (TA)n   Simple_repeat
Bf_V2_1 2351    3238    -   bf_rep_71   Unknown
Bf_V2_1 3229    3413    +   DNA-X-4_BF  DNA/Unknown
Bf_V2_1 3400    3506    +   Harbinger-N11_BF    DNA/Harbinger

If the positions on a base 1 like gff, use this perl one liner :

perl -nae 'print "$F[0]\tmySource\t$F[4]\t$F[1]\t$F[2]\.\t$F[3]\tRepeatFamily=$F[5]\n"' TE.bed

Output:

Bf_V2_1 mySource    bf_rep_71   1   797.    -   RepeatFamily=Unknown
Bf_V2_1 mySource    (TA)n   848 936.    +   RepeatFamily=Simple_repeat
Bf_V2_1 mySource    CR1-11_BF   1236    1369.   -   RepeatFamily=LINE/CR1
Bf_V2_1 mySource    (TA)n   2151    2171.   +   RepeatFamily=Simple_repeat
Bf_V2_1 mySource    bf_rep_71   2351    3238.   -   RepeatFamily=Unknown
Bf_V2_1 mySource    DNA-X-4_BF  3229    3413.   +   RepeatFamily=DNA/Unknown
Bf_V2_1 mySource    Harbinger-N11_BF    3400    3506.   +   RepeatFamily=DNA/Harbinger

If the positions on a base 0 like bed, use this perl one liner :

perl -nae 'print "$F[0]\tmySource\t$F[4]\t".($F[1] + 1)."\t$F[2]\.\t$F[3]\tRepeatFamily=$F[5]\n"' TE.bed

Output:

Bf_V2_1 mySource    bf_rep_71   2   797.    -   RepeatFamily=Unknown
Bf_V2_1 mySource    (TA)n   849 936.    +   RepeatFamily=Simple_repeat
Bf_V2_1 mySource    CR1-11_BF   1237    1369.   -   RepeatFamily=LINE/CR1
Bf_V2_1 mySource    (TA)n   2152    2171.   +   RepeatFamily=Simple_repeat
Bf_V2_1 mySource    bf_rep_71   2352    3238.   -   RepeatFamily=Unknown
Bf_V2_1 mySource    DNA-X-4_BF  3230    3413.   +   RepeatFamily=DNA/Unknown
Bf_V2_1 mySource    Harbinger-N11_BF    3401    3506.   +   RepeatFamily=DNA/Harbinger
ADD COMMENTlink written 2.3 years ago by Beuss120

while I certainly like the perl one liner solution, there's a few problems with the gff3 like format. Col3 is better represented as something like a CDS, mRNA or similar higher order class and $F[4] better goes to the end and there must not be a period following the coordinate

perl -nae 'print "$F[0]\tmySource\tCDS\t$F[1]\t$F[2]\t$F[3]\tName=$F[4];RepeatFamily=$F[5]\n"' TE.bed

See gff specifications for reference https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Carambakaracho2.2k

Sorry for the late response. I am trying to this now, will post results soon. Thanks

ADD REPLYlink written 2.3 years ago by jaqx00870
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1079 users visited in the last hour