0
0
Entering edit mode
5 weeks ago

Hello I have 700 metagenome assembled genomes that were taxonomically classified using the GTDB database with the GTDB-tk software

So I have taxonomic information assigned for each one of these MAGs but for downstream analysis I need the fasta headers to contain the taxonomic information that GTDB-tk assigned.

This is how the fasta headers of one of the MAGs looks like:

    cat cluster1_bin.101.fa | grep '>' | head

> k141_1192826

>k141_94001

>k141_1104537

>k141_375209

>k141_375646

> k141_742386

>  k141_560036

>  k141_12021

>  k141_838926

>   k141_1209697


And I want to know if there is a way of extract the full taxonomy of the following table and give it to the respective fasta headers of a MAG:

So this is the desired output for each mag fasta headers using the "cluster1_bin.101.fa" as example

> k141_1192826  Phylum Class Order Family Genus Species

>k141_94001  Phylum Class Order Family Genus Species

>k141_1104537 Phylum Class Order Family Genus Species

>k141_375209  Phylum Class Order Family Genus Species

>k141_375646 Phylum Class Order Family Genus Species

> k141_742386 Phylum Class Order Family Genus Species

>  k141_560036 Phylum Class Order Family Genus Species

>  k141_12021 Phylum Class Order Family Genus Species

>  k141_838926 Phylum Class Order Family Genus Species

>   k141_1209697 Phylum Class Order Family Genus Species


any way to do that using any programming language?

MAGs taxonomy fasta • 255 views
0
Entering edit mode

any way to do that using any programming language?

I think this can be done literally in any programming language of your choice. It is a simple fasta header addition which can be done with existing libraries (BioPerl, BioPython), or by using awk/sed to find header lines to which extra information needs to be added. But you will most likely need to write that script on your own.

0
Entering edit mode

Please do not post the images of the data.

0
Entering edit mode

You'll need to post the table in text form for us to be able to help easily.