Converting Abricate output (.tsv) to gff3 format
0
0
Entering edit mode
11 weeks ago
ghataksnd ▴ 20

Hello Everyone

I have a tsv file generated from abricate (https://github.com/tseemann/abricate). I need to convert them to gff3 format with certain columns retained, certain columns reordered, while other columns deleted.

We are trying to use these gff3 files for downstream applications and for piping into other applications. However, we could not solve it.

Below are examples of my tsv files, what possibly we may need to do, and desired output files in gff3 format.

Any help will be much appreciated.


"Petr Ponomarenko" could you please help?

Input tsv file:

FILE SEQUENCE START END GENE COVERAGE COVERAGE_MAP GAPS %COVERAGE %IDENTITY DATABASE ACCESSION PRODUCT

UBird_Cyou_D3.fna BJCZ01000001.1 1866608 1867417 cdtB 1-810/810 =============== 0/0 100 90 vfdb CAD48850 (cdtB) cytolethal distending toxin B [CDT (VF0185)] [Escherichia coli O157:H str. 493/89] UBird_Cyou_D3.fna BJCZ01000001.1 1867414 1868190 cdtA 1-777/777 =============== 0/0 100 90.61 vfdb CAD48849 (cdtA) cytolethal distending toxin A [CDT (VF0185)] [Escherichia coli O157:H str. 493/89] UBird_Cyou_D3.fna BJCZ01000001.1 2245186 2246238 ompA 1-1041/1041 ========/====== 1/12 100 94.11 vfdb AAF37887 (ompA) outer membrane protein A [OmpA (VF0236)] [Escherichia coli O18:K1:H7 str. RS218]

What we may need to do (there may be other ways too, I am not sure):

  1. Row 1 (always starts with "#") - Need to replace with the string "##gff-version 3"
  2. Col 1 - get rid of ".fna" and retain other data
  3. Insert new Col - print the string from Col 11 for all rows
  4. Col 2 - get rid of entire column
  5. Insert new Col - print "CDS" for all rows
  6. Col 3 - retain data
  7. Col 4 - retain data
  8. Insert new Col and print "." for all rows
  9. Insert new Col and print "+" for all rows
  10. Insert new Col and print "0" for all rows
  11. Col 5 to Col 10 - get rid of all these columns and data
  12. Col 11 - delete column
  13. Col 13 - retain data except "(", ")", "[", "]"
  14. Add new Col - Starting with "ID=" followed by the string taken from Col 1 and a underscore added (for the example data "UBird_Cyou_D3_") alongwith numerals starting from 1 and incrementing by 1. This column data needs to be appended by "product=" followed by data from the corresponding row of the modified Col 13. The separator between ID string and product string should be ";". After completion this column should be like "ID=UBird_Cyou_D3_1;product=cdtB cytolethal distending toxin B CDT VF0185 Escherichia coli O157:H str. 493/89"

Desired final output (*.gff3) considering the example data:

gff-version 3

UBird_Cyou_D3 vfdb CDS 187 756 . + 0 ID=UBird_Cyou_D3_1;product=cdtB cytolethal distending toxin B CDT VF0185 Escherichia coli O157:H str. 493/89

gff3 convert tsv Abricate • 208 views
ADD COMMENT

Login before adding your answer.

Traffic: 2060 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6