Incomplete alignment EMBOSS needleall
2
2
Entering edit mode
6 weeks ago

Hello everyone

I am trying to align the query Fastq nucleotide sequences with the reference (.fa) using EMBOSS needleall alignment tool. The query file has 2271611 sequences to align. However, the output sam file generated gives alignment score for only 881 sequences. I would like to generate the alignment score for all the query sequences. The output also generates needleall.error file. However, this error file is empty. I am not able to figure out what could be reason.

Looking for the response

Thanks

/home/user/EMBOSS-6.6.0/emboss/needleall -asequence X3.fa -bsequence X3rL41-post.merged.fastq -gapopen 10 -gapextend 0.5 -datafile 'EDNAFULL'   -outfile X3rL41_post_new.sam -aformat sam

Alignment • 384 views
1
Entering edit mode

I'm not sure if you're using the appropriate tool here ...

needleall is to do pairwise comparison of many sequences to many other sequences (usually the same as input set). Moreover, I think it is meant for fasta formatted files (not fastq).

when aligning short reads to a reference you're better of using of of the NGS-aligners : HiSat, Salmon?, STAR, bwa , ...

0
Entering edit mode

It does take fastq file as an input. The advantage of this tool is that it allows alignment of the fastq sequences with the fasta sequences.

1
Entering edit mode
6 weeks ago

not sure if it takes fastq as input (though yes, there example on their site is named .fastq , but when you look at that file it is a fasta formatted).

Anyway, all those NGS aligners take fastq as input and aligns against a fasta reference so I don't see the advantage here

Moreover, needleall makes use of the needleman-wunsh algorithm which is a global aligner, so it might be that it will not report a non-global alignment (as will be the case for most NGS reads, especially raw reads, aligned against a reference)

0
Entering edit mode

Thank you so much for your response. A simple tweak from fastq to fasta solved the problem and global alignment was turned off :-). Cheers.

0
Entering edit mode
4 weeks ago

Number of the reported alignments could be related to the -minscore, minimum alignment score, option.

I have checked this with a small fastq file:

$needleall -minscore 12 -stdout ../../petase.fa SRR17458628_1.h1000.fastq -auto | wc -l 83$ needleall -minscore 18 -stdout ../../petase.fa SRR17458628_1.h1000.fastq -auto | wc -l

33