Question: How to differentiate between "horizontal gene transfer" and "contamination" in NGS data?
gravatar for BioGeek
4.4 years ago by
BioGeek150 wrote:

I am just wondering for a suitable approach to filter "contamination" from the genome of know HGT nature? Any ideas and approaches are welcome.

Thanks for your favor.

ADD COMMENTlink modified 4.1 years ago by Michael Dondrup48k • written 4.4 years ago by BioGeek150
gravatar for Michael Dondrup
4.1 years ago by
Bergen, Norway
Michael Dondrup48k wrote:

A tale of two tardigrades

The discussion about contamination vs. HGT is still in flux. You might want to have a look at the following interesting case in PNAS on real or false HGT in tardigrades:

  • Boothby TC, et al. (2015) Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade. Proc Natl Acad Sci USA 112(52):15976–15981.
  • Koutsovoulos G, et al. (2016) No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini. Proc Natl Acad Sci USA 113:5053–5058.

In my opinion the evidence has shifted towards no HGT and towards contamination in this case.

In the following commentary, some guidelines for distinguishing contamination from HGT have been given:

  • Commentary - Biological Sciences - Genetics: Thomas A. Richards and Adam Monier A tale of two tardigrades PNAS 2016 113 (18) 4892-4894; published ahead of print April 15, 2016, doi:10.1073/pnas.1603862113
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Michael Dondrup48k
gravatar for Carlo Yague
4.4 years ago by
Carlo Yague5.2k
Carlo Yague5.2k wrote:

I suppose you are talking about DNA-seq data, right ?

In the case of a contamination, you can expect a somehow uniform distribution of the contaminant reads on their genome while if there is a true HGT event, then only one or a few extra genes will be represented.

A possible approach to distinguish both cases is to :

  1. Identify the origin of the contaminant/HGT using blast (for instance, E.coli).
  2. Take the reads that don't map on the genome of your model and map them on that possible contaminant.
  3. Have a look on the distribution of the mapped reads (with for instance, IGV). If the coverage is more or less uniform, then its probably a contamination while if you have one a few spikes, then it probably comes from HGT.
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Carlo Yague5.2k
gravatar for piet
4.4 years ago by
planet earth
piet1.8k wrote:

First of all, it it more likely to see contamination than true HTG.

If it is contamination, you have to discern two cases: 1) contamination happened before preparation of the sequencing library, and 2) contamination after library preparation (your library was cross contaminated with another library likely during loading of the sequencer). In both cases there are some characteristic features. If it is contamination with fast growing bacteria, you should especially see the high copy number genes of the contaminating bacterium, which are rRNA genes and plasmidic genes. In case of contamination with another library, that library may have an insert size distinct from your main library, see Plant viruses sequences are found in human brain Rna-seq sample: how to evaluate it?

You can identify true HTG if you find a chimeric contig, where a stretch of foreign DNA sequence is inserted into your target genome and is flanked by sequences of your target organism at BOTH ends. The whole contig should show a rather uniform read coverage, especially at the insertion sites.

ADD COMMENTlink written 4.4 years ago by piet1.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1309 users visited in the last hour