Phylogenetic tree for my bins with abundance, genome size etc
1
0
Entering edit mode
5 months ago
aswin • 0

I would like to create a tree for my MAGs/Bins like the image shown below. Can anyone please share me the steps/scripts in detail?

When I used, gtdbtk de_novo_wf to analyze a set of bin files with -skip_gtdb_refs. I always ends up with the following error.

[2023-11-20 12:55:35] INFO: Read custom taxonomy for 45 genomes. [2023-11-20 12:55:35] INFO: Reassigned taxonomy for 45 GTDB representative genomes. [2023-11-20 12:55:35] ERROR: GTDB-Tk classification and custom taxonomy files must not specify taxonomies for the same genomes. [2023-11-20 12:55:35] ERROR: These files have 45 genomes in common. [2023-11-20 12:55:35] ERROR: Example duplicate genome: bin.18 [2023-11-20 12:55:35] ERROR: Duplicated taxonomy information. [2023-11-20 12:55:35] ERROR: Controlled exit resulting from an unrecoverable error or warning.

my script is below, gtdbtk de_novo_wf --genome_dir /zfs/camplab/Jojy/darpa_working/drep-output_directory/dereplicated_genomes --out_dir /zfs/camplab/Jojy/darpa_working/gtdbtk_oct2023/de_novo_new --extension fa --bacteria --gtdbtk_classification_file /zfs/camplab/Jojy/darpa_working/gtdbtk_oct2023/gtdbtk.bac120.summary.tsv --cpus 40 --outgroup_taxon p__Chloroflexota --skip_gtdb_refs --custom_taxonomy_file /zfs/camplab/Jojy/darpa_working/gtdbtk_oct2023/CUSTOM_TAXONOMY_FILE

These genomes have actually already been analyzed with classify_wf, with the taxonomy information obtained. So used gtdbtk.bac120.summary.tsv as --gtdbtk_classification_file and I made a custom_taxonomy file from the same summary. Both are attached with this.

my custom_taxonomy_file is below bin.1 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__;f__;g__;s__ bin.10 d__Bacteria;p__Planctomycetota;c__Planctomycetia;o__Planctomycetales;f__Planctomycetaceae;g__;s__ bin.11 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__UBA4575;f__UBA4575;g__JABDMD01;s__ bin.13 d__Bacteria;p__Tectomicrobia;c__Entotheonellia;o__Entotheonellales;f__Entotheonellaceae;g__Entotheonella;s__ bin.14 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__UBA4486;f__UBA4486;g__;s__ bin.15 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__UBA6522;f__UBA6522;g__;s__ bin.16 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__GCA-001735895;f__GCA-001735895;g__GCA-001735895;s__ bin.17 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Woeseiales;f__Woeseiaceae;g__SZUA-117;s__ bin.18 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudohongiellaceae;g__UBA5109;s__ bin.19 d__Bacteria;p__Planctomycetota;c__Planctomycetia;o__Pirellulales;f__;g__;s__ bin.2 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Rhizobiaceae;g__JAALLB01;s__ bin.20 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Verrucomicrobiales;f__DEV007;g__;s__ bin.22 d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Methyloligellaceae;g__MnTg02;s__ bin.24 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__UBA6186;f__UBA6186;g__;s__ bin.25 d__Bacteria;p__Planctomycetota;c__Planctomycetia;o__Pirellulales;f__Pirellulaceae;g__Mariniblastus;s__ bin.26 d__Bacteria;p__Planctomycetota;c__Planctomycetia;o__Pirellulales;f__Lacipirellulaceae;g__Bythopirellula;s__ bin.27 d__Bacteria;p__Actinobacteriota;c__Acidimicrobiia;o__Acidimicrobiales;f__UBA11606;g__;s__ bin.28 d__Bacteria;p__Planctomycetota;c__PLA2;o__PLA2;f__JAEUHO01;g__;s__ bin.29 d__Bacteria;p__Planctomycetota;c__Planctomycetia;o__Pirellulales;f__Pirellulaceae;g__GCA-2726245;s__ bin.3 d__Bacteria;p__Planctomycetota;c__Planctomycetia;o__Pirellulales;f__Pirellulaceae;g__GCA-2723275;s__ bin.30 d__Bacteria;p__Acidobacteriota;c__Vicinamibacteria;o__Bin61;f__SMYC01;g__;s__ bin.31 d__Bacteria;p__Acidobacteriota;c__Thermoanaerobaculia;o__UBA5704;f__UBA5704;g__;s__ bin.32 d__Bacteria;p__Planctomycetota;c__Planctomycetia;o__Pirellulales;f__Lacipirellulaceae;g__Bythopirellula;s__ bin.33 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__UBA6522;f__UBA6522;g__;s__

I used gtdbtk.bac120.summary.tsv as --gtdbtk_classification_file , the table attached below.

Please help me.

Thank you in advance, ** image source (source Bandla et al.,2020)

enter image description here

enter image description here

Bins tree • 496 views
ADD COMMENT
0
Entering edit mode
5 months ago
Mensur Dlakic ★ 27k

I think this is the key to the error message:

GTDB-Tk classification and custom taxonomy files must not specify taxonomies for the same genomes.

My guess is that you have to use a different --gtdbtk_classification_file or not use it at all. At least that's what the error said.

I occasionally do what you are doing but _without_ specifying --gtdbtk_classification_file and have no problem getting the result. If you end up with too many species in a tree, you can always extract the concatenated alignment only for the MAGs/genomes of interest and build a custom tree of your own.

ADD COMMENT
0
Entering edit mode

Thank you for the quick response.

When I use it without --gtdbtk_classification_file . I face another error.

[2023-11-21 10:14:08] ERROR: Uncontrolled exit resulting from an unexpected error.

================================================================================ EXCEPTION: TypeError MESSAGE: Population must be a sequence. For dicts or sets, use sorted(d).


Traceback (most recent call last): File "/zfs/gcl/software/gbf/anaconda3/2021.11/envs/py311/lib/python3.11/site-packages/gtdbtk/__main__.py", line 101, in main gt_parser.parse_options(args) File "/zfs/gcl/software/gbf/anaconda3/2021.11/envs/py311/lib/python3.11/site-packages/gtdbtk/main.py", line 1051, in parse_options self.root(options) File "/zfs/gcl/software/gbf/anaconda3/2021.11/envs/py311/lib/python3.11/site-packages/gtdbtk/main.py", line 776, in root reports = reroot.root_with_outgroup(options.input_tree, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/zfs/gcl/software/gbf/anaconda3/2021.11/envs/py311/lib/python3.11/site-packages/gtdbtk/reroot_tree.py", line 83, in root_with_outgroup rnd_ingroup = random.sample(ingroup_leaves, 1)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/zfs/gcl/software/gbf/anaconda3/2021.11/envs/py311/lib/python3.11/random.py", line 439, in sample raise TypeError("Population must be a sequence. "

TypeError: Population must be a sequence. For dicts or sets, use sorted(d).

I am really new to GTDBtk. Can you please help me with this. I already update the aligner too. I am working on HPC

ADD REPLY
0
Entering edit mode

I don't have the capacity to troubleshoot every single problem you may encounter with this program. There is a GitHub site where you can explain the files you used, the command and the error in greater detail. They should be able to help.

ADD REPLY

Login before adding your answer.

Traffic: 1644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6