Question: How to differentiate between "horizontal gene transfer" and "contamination" in NGS data?
gravatar for BioGeek
14 months ago by
BioGeek80 wrote:

I am just wondering for a suitable approach to filter "contamination" from the genome of know HGT nature? Any ideas and approaches are welcome.

Thanks for your favor.

ADD COMMENTlink modified 11 months ago by Michael Dondrup43k • written 14 months ago by BioGeek80
gravatar for Carlo Yague
14 months ago by
Carlo Yague3.2k
Carlo Yague3.2k wrote:

I suppose you are talking about DNA-seq data, right ?

In the case of a contamination, you can expect a somehow uniform distribution of the contaminant reads on their genome while if there is a true HGT event, then only one or a few extra genes will be represented.

A possible approach to distinguish both cases is to :

  1. Identify the origin of the contaminant/HGT using blast (for instance, E.coli).
  2. Take the reads that don't map on the genome of your model and map them on that possible contaminant.
  3. Have a look on the distribution of the mapped reads (with for instance, IGV). If the coverage is more or less uniform, then its probably a contamination while if you have one a few spikes, then it probably comes from HGT.
ADD COMMENTlink modified 14 months ago • written 14 months ago by Carlo Yague3.2k
gravatar for Michael Dondrup
11 months ago by
Bergen, Norway
Michael Dondrup43k wrote:

A tale of two tardigrades

The discussion about contamination vs. HGT is still in flux. You might want to have a look at the following interesting case in PNAS on real or false HGT in tardigrades:

  • Boothby TC, et al. (2015) Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade. Proc Natl Acad Sci USA 112(52):15976–15981.
  • Koutsovoulos G, et al. (2016) No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini. Proc Natl Acad Sci USA 113:5053–5058.

In my opinion the evidence has shifted towards no HGT and towards contamination in this case.

In the following commentary, some guidelines for distinguishing contamination from HGT have been given:

  • Commentary - Biological Sciences - Genetics: Thomas A. Richards and Adam Monier A tale of two tardigrades PNAS 2016 113 (18) 4892-4894; published ahead of print April 15, 2016, doi:10.1073/pnas.1603862113
ADD COMMENTlink modified 11 months ago • written 11 months ago by Michael Dondrup43k
gravatar for piet
14 months ago by
planet earth
piet1.4k wrote:

First of all, it it more likely to see contamination than true HTG.

If it is contamination, you have to discern two cases: 1) contamination happened before preparation of the sequencing library, and 2) contamination after library preparation (your library was cross contaminated with another library likely during loading of the sequencer). In both cases there are some characteristic features. If it is contamination with fast growing bacteria, you should especially see the high copy number genes of the contaminating bacterium, which are rRNA genes and plasmidic genes. In case of contamination with another library, that library may have an insert size distinct from your main library, see Plant viruses sequences are found in human brain Rna-seq sample: how to evaluate it?

You can identify true HTG if you find a chimeric contig, where a stretch of foreign DNA sequence is inserted into your target genome and is flanked by sequences of your target organism at BOTH ends. The whole contig should show a rather uniform read coverage, especially at the insertion sites.

ADD COMMENTlink written 14 months ago by piet1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 818 users visited in the last hour