Question

How to segregate metagenomic sequences in prokaryotic and eukaryotic groups

0

Entering edit mode

8.0 years ago

dlpp • 0

Hi all,

I'm completely new to metagenomics field, so I apologize in advance if this is a very trivial and confusing post. I have some millions of Illumina HiSeq reads generated from deep sea water. They are paired end reads, but since the quality of the reverse reads are not so good, I'm planning to work only with the forward reads at the beginning. I'm particularly interested in analyzing the eukaryotic sequences using, for example qiime. However, since the files are too huge (around 60GB each sample) , I'm not able to run even the fastest strategy in qiime (ucrss_fast_O29_r97) for OTU picking (More than 10 hours running using 320GB of mem and 29 cpus, and it didn't finish even the first step). I've tried running the same qiime commands for a small part of one sample (around 1.3GB) and it worked very well, so I guess the problem is the size of the file rather than a mistake in the commands. I've tried to use Kraken to separate the prokaryotic from the eukaryotic data, but it did not work (most of the reads - 96% - were unclassified, and I know this is not true because of previous tests). I would like to have some suggestions for separating the eukarya from prokarya data so that I could proceed with the OTU picking in qiime only with the eukaryotes. Any suggestion will be deeply appreciated.

Thank you!

next-gen metagenomics 18s 16s eukarya • 2.0k views

ADD COMMENT • link updated 6.3 years ago by predeus ★ 1.9k • written 8.0 years ago by dlpp • 0

score 0 · Answer 1 · 2018-01-02

Ok, I know this is a really old thread, but anyways: you're getting a very low taxonomy assignment in Kraken since you are using the default database, which is composed of known bacteria and viruses. You need something with eukaryotes in it, probably nt database (the default setting for Web blastn).

I would recommend Centrifuge, with their pre-made nt index (see on the right side of https://ccb.jhu.edu/software/centrifuge/manual.shtml). It's very fast and reasonably space-efficient.