Converting Abricate output (.tsv) to gff3 format
0
0
Entering edit mode
15 months ago
ghataksnd ▴ 20

Hello Everyone

I have a tsv file generated from abricate (https://github.com/tseemann/abricate). I need to convert them to gff3 format with certain columns retained, certain columns reordered, while other columns deleted.

We are trying to use these gff3 files for downstream applications and for piping into other applications. However, we could not solve it.

Below are examples of my tsv files, what possibly we may need to do, and desired output files in gff3 format.

Any help will be much appreciated.

"Petr Ponomarenko" could you please help?

Input tsv file:

#FILE   SEQUENCE    START   END GENE    COVERAGE    COVERAGE_MAP    GAPS    %COVERAGE   %IDENTITY   DATABASE    ACCESSION   PRODUCT
UBird_Cyou_D3.fna   BJCZ01000001.1  1866608 1867417 cdtB    1-810/810   =============== 0/0 100 90  vfdb    CAD48850    (cdtB) cytolethal distending toxin B [CDT (VF0185)] [Escherichia coli O157:H str. 493/89]
UBird_Cyou_D3.fna   BJCZ01000001.1  1867414 1868190 cdtA    1-777/777   =============== 0/0 100 90.61   vfdb    CAD48849    (cdtA) cytolethal distending toxin A [CDT (VF0185)] [Escherichia coli O157:H str. 493/89]
UBird_Cyou_D3.fna   BJCZ01000001.1  2245186 2246238 ompA    1-1041/1041 ========/====== 1/12    100 94.11   vfdb    AAF37887    (ompA) outer membrane protein A [OmpA (VF0236)] [Escherichia coli O18:K1:H7 str. RS218]

What we may need to do (there may be other ways too, I am not sure):

  1. Row 1 (always starts with "#") - Need to replace with the string "##gff-version 3"
  2. Col 1 - get rid of ".fna" and retain other data
  3. Insert new Col - print the string from Col 11 for all rows
  4. Col 2 - get rid of entire column
  5. Insert new Col - print "CDS" for all rows
  6. Col 3 - retain data
  7. Col 4 - retain data
  8. Insert new Col and print "." for all rows
  9. Insert new Col and print "+" for all rows
  10. Insert new Col and print "0" for all rows
  11. Col 5 to Col 10 - get rid of all these columns and data
  12. Col 11 - delete column
  13. Col 13 - retain data except "(", ")", "[", "]"
  14. Add new Col - Starting with "ID=" followed by the string taken from Col 1 and a underscore added (for the example data "UBird_Cyou_D3_") alongwith numerals starting from 1 and incrementing by 1. This column data needs to be appended by "product=" followed by data from the corresponding row of the modified Col 13. The separator between ID string and product string should be ";". After completion this column should be like "ID=UBird_Cyou_D3_1;product=cdtB cytolethal distending toxin B CDT VF0185 Escherichia coli O157:H str. 493/89"

Desired final output (*.gff3) considering the example data:

##gff-version 3
UBird_Cyou_D3   vfdb    CDS 187 756 .   +   0   ID=UBird_Cyou_D3_1;product=cdtB cytolethal distending toxin B CDT VF0185 Escherichia coli O157:H str. 493/89
Abricate gff3 • 506 views
ADD COMMENT

Login before adding your answer.

Traffic: 1312 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6