Dear all, I am a rookie in data analysis and stuck with my results dnt know how to interpret them.
I started with 7 metagenomic assemblies of different species of Azolla fern. The aim was to identify bacteria in leaf ecosytem of azolla different species. Out hypotheisis was, if there are similar bacteria which repeat within the azollas different species, they will cluster together when their genomes will be plotted in dendrogram or a tree.
The method used spades
to get assemblies, BWA
was used to do backmapping, samtools
for sorting, metabat
for binning and checkm for to see completeness and contamination of bins.
Then prokka
was used to annotate the genomes and uniport ids were obtained and table was made of all uniport id of all the bins. the table was changed to binary table and then used to create a dendrogram in R.
The dendrogram and then tree made by using dendrogram in fig tree. In the tree i observed that the bacteria are clustering according to the metagenomic sample or plant host not on the basis of their similar taxonomical name eg rhizobiales is clustering with burkholderiales of same metagenomic assembly but not with rhizobiales of other host plant assembly.
Im on the dead end how to intrepret these results and what can i deduce from it. and are there other ways to improve my approach? Can i compare similar taxonomical bins directly of different metagenomic assemblies any suggestions will be valuable.
kind regards
manpy student utrecht university holland
Please take a moment to come up with a short title that is also informative/on point. Putting your actual question in the title is not a good practice.
ok sir i will try to come with small question
Can you clarify if fern sequences were excluded (either during library prep stage or by informatics afterwards)? Did you get an equivalent amount of data from all samples?
yes sir, we chose here to filter the sequencing data for plant DNA, since we were only interested in microbial DNA. The sequencing data was got rid of any plant DNA by mapping (aligning) the reads to a reference plant genome. Only the sequencing data that did not map anything was kept for further analysis. And yes we got kind of equal data from samples
Thanks for clarifying that. Can you also comment on what happened to the sequences quantitatively? How much data did you lose and was an equivalent amount left over for every sample?
Perhaps you don't have enough data (there is no guarantee that bacterial genomes were fully sampled or you did not lose useful data during the filtering) to draw a useful conclusion.
BTW: There is no need to use honorifics. We are all fellow scientists.
we only used the data dna extracted from leaf pocket of these fern because it contains many symbiotic bacteria the assembly of the scaffolds was already made i started with to find out how much abundant each scaffold is in the different metagenomic samples. I alignED the illumina reads (FastQ files) to the scaffolds (fasta) of the assembly in a step called backmapping with a tool called BWA (Burrows-wheeler aligner) and then sorting was done
i donot know how to see how much data we lost assemblies were already created by my supervisor i started from backmapping step