Question: Distinguishing sequencing reads as prokaryotic or eukaryotic without a reference genome
5 months ago
I've got sequencing data from the microbiome of a eukaryote that does not have a reference genome. I have performed plenty of pre-sequencing steps to exclude as much eukaryotic DNA as possible however, I still wish to determine if any made it through after sequencing and assembly. What could I do to at least classify the reads as eukaryotic vs prokaryotic?


5 months ago

can you elaborate a little on what you all have done already?

From the top of my head there is not much you can do I think

5 months ago by lieven.sterck

I extracted the guts of the organism, then placed them in a digestion cocktail to create a single celled suspension, I then filtered it to help break up any clumps. I stained the sample to prepare it for Flourescent cell sorting, we size separated cells to exclude anything larger than 5uM . ideally, this should get rid of the eukaryotic cells thus most if not all of the DNA, however there could be free floating DNA from cells that may have burst. So we checked that with qPCR to quantify the levels of the host DNA before and after sorting. We did see a decrease. So we proceeded with sequencing and assembly. This is the first time we've went through this entire process as a whole. so once we received the assembly stats, my PI wanted one final check after the meta genome assembly to see if there were any eukaryotic reads still present. The problem is that there isn't a reference genome for the eukaryotic organism we're doing this experiment with. When we run this again in the future we're likely going to run a DNAse treatment after cell sorting to degrade the free floating DNA that could be there.

5 months ago by darinshrewsberry1994

Just run all the reads you have through something like centrifuge or kraken and it'll fairly quickly identify whats what to a reasonably resolution.

It may even let you segregate just the ones you want too but I'm not 100%.

5 months ago by jrj.healey

We did run a Kraken analysis and had around 25% characterization, but we're not sure what of the uncharacterized is host or just bacteria that don't exists in the database. Given that our qPCR results suggested that we had little to no host DNA in our sample right before we sent it off for sequencing, we were a little stumped.

5 months ago by darinshrewsberry1994
5 months ago
There are several tools for this task, I personally like BlobTools for assembled draft genomes. Here is what you get:


I am plagiarizing myself (Interpreting mapping contaminants):

I like to use BlobTools (blasting against NCBI NT) to explore the taxonomic assignment of an assembly, and detect possible contamination - that is, I check for contaminants post-assembly.

You can also use sketches to analyse contamination either on your raw data (pre-assembly) or on assemblies, see:

Mash Screen: what's in my sequencing run?

What’s in my metagenome?

Tool: BBSketch - A Tool for Rapid Sequence Comparison

Finally, you can also use kmer screening tools like Kraken or Centrifuge to screen and filter out contaminants.

5 months ago by h.mon

Thanks, I'll give this a look.

5 months ago by darinshrewsberry1994
