Question: Aligning short oligos containing Ns to a genome
4 weeks ago by
United States
dacotahm20 wrote:

Hello, I want to align short sequences that contain Ns to a genome, but I want to preserve the number of Ns and not allow gaps or mismatches. For example -


I know matches exist -

$ grep 'CTCGAG......ATGTGG' OlGenomev3.fasta

But I have not been able to recover them as an alignment with blast or bwa. The above example excludes any matches lost by a line wrap. What is the best way to do this with an alignment program?

bwa aln -e 1 -e 6 -t 4 -M 1 -O 1 -E 1 Oligv3BWA ./BrookeGenes/BrookeSeqs.fasta > test3.bwa


blastn -query Seqs.fasta -db ../OlGenomev3.fasta -task 'blastn-short' -max_target_seqs 5 -word_size 4 -outfmt "6 qseqid sseqid qlen length qseq sseq pident sstrand" -gapopen 10 -penalty -1 -out SeqsXOlv3.csv

Return no matches.


bwa blast alignment • 103 views
Not sure about BLAST but most NGS aligners consider ambiguous (non-ATCG) characters as mismatches. In you example 6/18 so 33% are mismatches and centered in the read. I do not see how this is ever going to be a valid alignment at that short query length.

ADD REPLYlink written 4 weeks ago by ATpoint26k

Hi, If your file is not too big than you can try this command. Here your query sequence will act as a motif so it preserve "Ns"

    seqkit.exe locate --degenerate --ignore-case --pattern-file your_query.fa file_conatin_all_seq.fasta

You can modify the command according to your requirement.

Hoping it may help you.

ADD REPLYlink written 4 weeks ago by archana.bioinfo87180
