Question

Identifying Unknowns from WGS samples

0

Entering edit mode

7.1 years ago

Harry ▴ 10

Greetings!

Introduction

I am a recent graduate starting off with work at a public health laboratory. This specific laboratory is just beginning its first few WGS runs on its new MiSeq, and I am the one who is charged with the analysis. I have done some work on sequences acquired through the NCBI website, but have never worked with actual lab samples in the past. Most of the samples are already identified by the time I get them; however, there will also be a few unknowns.

For the samples with known identities, I've used Sequencher to generate a consensus from mapping against a reference genome (after doing QC with FastQC and Timmomatic).

What I Already Know

Each sequence is whole-genome
Each sample is an enteric
BLAST will be needed in some fashion
I will need to be able to create a consensus for my unknown
- (I don't know how to do this, without a reference genome)

The Actual Questions

What is the most simple/efficient way YOU might identify a bacterial WGS sample?
If I look for something conserved (say a 16s gene) to BLAST it, what would be the best way to find that gene?
Would it be possible to locally BLAST multiple 16s genes against my WGS to see which one matches?

Open For Discussion

For any of these questions: If you have any suggestions, solutions, or places where I can learn more about this topic, please let me know! Any help offered is greatly appreciated.

genome next-gen sequencing unknown sample • 1.7k views

ADD COMMENT • link 7.1 years ago by Harry ▴ 10

score 0 · Answer 1 · 2017-08-08

What is the most simple/efficient way YOU might identify a bacterial WGS sample?

Are you looking at mixtures of species or are these pure samples? In any case taking a random sample of reads and blasting them at NCBI against refseq_genomes or refseq_representative_genomes should give you an idea of the genome (at the genus level for sure perhaps species too) you are looking at.

If I look for something conserved (say a 16s gene) to BLAST it, what would be the best way to find that gene?

NCBI does have a 16S ribosomal sequence database available on the web (or you could download the pre-created 16S indexes and do the search locally).

Would it be possible to locally BLAST multiple 16s genes against my WGS to see which one matches?

You could try extracting the fasta files from the above index and use them to search against your data.

I will need to be able to create a consensus for my unknown

Take a look at SPAdes. It is likely the best bacterial genome assembler out there at this time.