4 months ago
Louise

Hi all,

As I was a freshman to dig out some microbial analysis, I always asked these FAQs. So I summarized some answers and share with you to solve microbiome genomic issues about bioinformatics:

Q1: Why cannot get a good assembly result from contaminated data?


The assembly software treats the sequencing data as if it comes from the same genome. If there is mixed foreign DNA, the sequences in the DNA from different sources will interfere with the assembly. To ensure the accuracy of the assembly, partially different fragments are cut into sequences. This process will result in the final assembly, which can only yield fragmented sequences.

If the contamination is from a species close enough to reference genome, it is limited by similar sequences that may be included in the foreign DNA itself. Due to the potential difference between the target genome and the reference genome, the separation may have certain false positives and false negatives. Therefore, in any case, the assembly after separation cannot reach the standard of pure DNA.

Q2: How is the GC-depth map made? Does it have meaning?


The GC-depth graph characterizes the relationship between the GC content and the depth distribution of the entire genome. The specific method is designed to segment the genome sequence to a certain length. Each window has a specific GC content, and it reads a specific coverage depth, which corresponds to a point on the graph.

For a purer sample, the content will be concentrated in a certain area and spread around. And if the GC-depth map is divided into multiple concentrated areas, it generally means that the assembly results contain DNA from different sources. If the concentrated areas are separated at the GC level, there is a high possibility of external contamination. The GC does not separate. When it is only divided deeply, it may be part of the DNA from the plasmid, which needs to be combined with other information, such as NT comparison results, for further in-depth analysis.

Q3: Why can some plasmids in the completed map sample be looped while some cannot be looped?


When we analyse a sample genome, sequencing depth of chromosome reads is usually about 100X. The sequencing depth of circular plasmids is about 80X. The sequencing depth of acyclic plasmids was only about 20-40X.

Therefore, it is likely that the plasmid copy number of the sample is small. The plasmid sequencing depth does not reach a sufficient multiplier. That’s why the plasmid assembly does not form a circle.

Q4: What are the methods for fungal gene prediction?


There are three methods for fungal gene prediction: ab initio prediction, homology prediction and prediction based on transcriptome data. Augustus software was used for de novo prediction, and Genewise software was used for homology prediction.

Based on the homology alignment, the coding gene sequence of the same species needs to be provided. When the closer the pair is, the better the prediction result. It is best to provide the coding gene information of related species or the assembled transcript sequence file. The prediction results of the three methods will be integrated through EVM. So a close reference sequences and transcription data can be provided, the prediction results from the three methods will get better results.

