How to check for contamination and separate the data?
2
0
Entering edit mode
7.8 years ago

Hello,

I have Illumina paired end data (phage/bacteria+phage) which I have assembled using CLC genomics workbench. I did a blastn of the resulting scaffolds and found hits for both the bacteria and the phage. Now I am doubtful if Blast is a true test that my data contains sequences from both the bacteria in question and it's phage.

What other method can I use to check the same and how can remove reads belonging to the phage from that belonging to bacteria?

WGS bacteriophage • 2.3k views
ADD COMMENT
0
Entering edit mode
7.8 years ago
GenoMax 141k

Take a look at this thread: BBSplit syntax for generating builds for the reference genome and how to call different builds. You should be able to use BBSplit to separate your reads into genome specific bins (with the exception of multi-mappers which you will need to choose how to handle). I recommend that you start with the raw data. Let BBSplit do the binning and then do your assembly in CLC.

You know for sure that the phage is not integrated in the bacterial genome (or are there 2 phage)?

ADD COMMENT
0
Entering edit mode
4.5 years ago

Using the latest version of CLC Genomics Workbench there are plenty of options for this

  1. use the Microbial Genomics Module's binning tools to split your contigs from your assembly into bins either by taxonomic similarity or by sequence/kmer similarity (or both) - "pure bins" are probably good to go, but those that are could be investigated further for contamination.
  2. you could download UniVec from NCBI as a FASTA, import it into CLC and use read mapper with stringent settings to filter your FASTQ for common contaminatiing vector sequences.
    1. use MGM to do a full taxonomic profiling of your raw fastq data using a custom database of UniVec, Phage, Plasmid and bacterial genomes. Pull out reads in the clades you want to keep, and the repeat your assembly with just those and with/without any "unclassified" reads.
ADD COMMENT

Login before adding your answer.

Traffic: 1895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6