Blastn - matches with another species
0
0
Entering edit mode
10 months ago
davidmaimoun ▴ 50

Hello,

I am writing a workflow on neisseria meningitidis. After the assembly (by Spades), I want to know that I really have this specie before continuing.

So, I added a blastn step: picked references from ncbi, and run blast on them. When I run blast on the complete genome, I find really good hits with the neisseria meningitidis - bit score max 50000 - but also some hits, very few but still, with another bacteria (salmonella enterica, bit score max 800).

On the other hand, when I cut my genome and pick only 500000 nucleotides, I don't get hits with the other species.

Is it normal?

Is it better to run blast only on a part of the genomes?

How can I know when the bitscore is good or not?

Thank you

blast blastn • 625 views
ADD COMMENT
0
Entering edit mode

Using a local aligner like blast for doing sequence similarity searches on whole genomes does not seem like a good idea. You are going to see hits to similar organisms (like you do above).

Perhaps you need to think about using an alternate like bbsketch: BBSketch - A Tool for Rapid Sequence Comparison

I am writing a workflow on neisseria meningitidis. After the assembly (by Spades), I want to know that I really have this specie before continuing.

So you are not sure if the starting sample is pure neisseria or if it contains other organisms?

ADD REPLY
0
Entering edit mode

Hi, thank you for your help and sorry if I wasn't clear.

I'm almost 100% that is a neisseria. But my boss want still add a step to check, in case there was a contamination or something.

So I picked reference assemblies of different species (including neis. meningitidis) from NCBI refseq and create a database, and run blast on it, with my samples as input.

ADD REPLY
1
Entering edit mode

How about flipping this test around? You could create a simulated read dataset (Illumina reads, PE, 100 bp) using the Neisseiria genome from RefSeq. Then use this to align against your assembly. Look for alignment % and coverage across the genome (depth). Former should be very high (depending on how similar your strains are to the reference).

ADD REPLY
0
Entering edit mode

It seem a great idea Thanks for your help!

ADD REPLY

Login before adding your answer.

Traffic: 2371 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6