Question: Separating mixed bacterial 16S RNA sequences from eukaryotic 18S ITS sequences?
3.9 years ago by
Lionel120 wrote:

A bit of back ground - I have been given thousands of reads in FASTQ format containing sequences (16S and 18S ITS) sequenced by ILLUMINA. The adapter indices have already been removed.

The plan is to separate bacterial and fungal sequences and then BLAST them to determine community composition. Although the adapter indices arent present, the Euk sequences should contain an Illumina bottom primer sequence as well as the ITSF and ITS2 primer sequences in both orientations.

What would be a method to extract sequences which contain these sequences (taking into account that some of these reads are not exact leading to a degree of error).

Any help would be greatly appreciated!

I would use the SIlvaNGS pipeline and sort out Bacterial and Euk seqs. Just an idea.

Just thinking out aloud. These are not the solutions:

  1. Why not blast them as is (against a db of 16S + 18S ITS)?
  2. Try bbsplit method described in this thread using 16S reference DB: BBSplit syntax for generating builds for the reference genome and how to call different builds. Some bacterial sequences may escape.
