I am relatively new in de-novo assembly, so please accommodate my question if it is very basic and appear to be naïve. I a working on MI-seq data 2X150bp of a bacterial strain. I used Valvvt (51kmer) and it gave me contigs which looks like this:
NODE253250length121cov1.338843 TGCCTGCTCTTCTGCTTTTCTACCATGTTATGATGCAGTATGAACGCCCTTGCCAGAAGCTGCTGC NODE253255length105cov1.000000 TGGAAGCCCCACTCTCAGTATTGACGTGCAAGTTCACAGTCTGGTTCCTGCCCCCGCGGT------
I have a reference genome of bacteria too. Now I want to pin point in which sample bacteria is present or not. Based on my literature reading- since genome is small I performed denovo assembly. However how from the above contigs I will found out which one is best and useful and showed that bacterial is present? What parameters should I be using- length of contig or something else to find out which one has to be more useful? If I use blast align pairwise alignment with reference, it takes a while and return an error message Bad Gateway perhaps the contig file is large (69523word). Any suggestion or pointers will be highly appreciable.
It's troubling that you have a coverage of 1 on some of these contigs in the assembly. You should try out VelvetOptimiser if you are comfortable on the command line. Or, you should remove these low coverage contigs (maybe anything<10 and length<150).
Can you be a little more clear with what you research question is? Are you sequencing a pure culture of an unknown bacteria (why are you asking "now i want to pin point which sample bacteria is present or not")?
Sounds like you are asking two questions here: one about your methodology and another about your problem with BLAST. I think you need to clearly define your methodology and research question first. Second, we can try to figure out why you are having a BLAST error.
No it is one basic q- I ran seq on samples and want to see if a particular bacteria is absent/ present. Have done denovo assembly of seq data- to generate contig. How should I handle these contigs to point out which one is significant and is corresponding to reference bacteria?