Question

Conversion in fasta and sequence analysis

0

Entering edit mode

5.4 years ago

demaioflavio • 0

Dear all, I am a beginner in sequencing analysis, and I am sorry for any incorrect query. I have sequenced some bacterial genome obtaining fastq files. To process this file I have used illumina basespace app: prokka and BWA (to align sequences with the reference genome) obtaining the .bam file. From .bam file I have obtained the fasta.file but it presents multiple sequences (contings). I used samtools. Can I generate a single sequence. I should search on the obtained sequence a specific region to find SNPs or indels.

despite the early question, Is this pipeline correct? or I have to modify it???

I am a microbiologist and until now I have used GUI software to obtain information and never this kind of programs

Thank you a lot

sequencing • 733 views

ADD COMMENT • link 5.4 years ago by demaioflavio • 0

0

Entering edit mode

There are a few misunderstandings in your post.

Prokka is not part of aligning/assembly - only annotation of the finished genome. Have you actually run an assembly step (using SPAdes, Velvet, SOAP etc? Do any of these names seem familiar?)

The short answer is that with illumina data only (like you have) its highly unlikely you'll get a single sequence for any but the shortest, simplest genomes. You would need to do hybrid assembly with a long read technology to 'close' or finish the genome. Without that, or manual primer walking of gaps, multifasta/multi-genbank (if annotated) is as good as it gets.

That might be perfectly fine for finding some of your mutations of interest though.

ADD REPLY • link 5.4 years ago by Joe 21k