Question

Aligning short oligos containing Ns to a genome

0

Entering edit mode

4.5 years ago

dacotahm ▴ 20

Hello, I want to align short sequences that contain Ns to a genome, but I want to preserve the number of Ns and not allow gaps or mismatches. For example -

CTCGAGNNNNNNATGTGG

I know matches exist -

$ grep 'CTCGAG......ATGTGG' OlGenomev3.fasta
TAACGATGGCAAAAGGAAAG**CTCGAGCCAAGCATGTGG**ACGATATATAATACGATCAAGG
TATTGATGTTACGAATGCGTGATAAATTAACAAATAATTC**CTCGAGCGTAAAATGTGG**GT

But I have not been able to recover them as an alignment with blast or bwa. The above example excludes any matches lost by a line wrap. What is the best way to do this with an alignment program?

bwa aln -e 1 -e 6 -t 4 -M 1 -O 1 -E 1 Oligv3BWA ./BrookeGenes/BrookeSeqs.fasta > test3.bwa

and

blastn -query Seqs.fasta -db ../OlGenomev3.fasta -task 'blastn-short' -max_target_seqs 5 -word_size 4 -outfmt "6 qseqid sseqid qlen length qseq sseq pident sstrand" -gapopen 10 -penalty -1 -out SeqsXOlv3.csv

Return no matches.

Thanks-

alignment blast bwa • 783 views

ADD COMMENT • link 4.5 years ago by dacotahm ▴ 20

0

Entering edit mode

Not sure about BLAST but most NGS aligners consider ambiguous (non-ATCG) characters as mismatches. In you example 6/18 so 33% are mismatches and centered in the read. I do not see how this is ever going to be a valid alignment at that short query length.

ADD REPLY • link 4.5 years ago by ATpoint 82k

0

Entering edit mode

Hi, If your file is not too big than you can try this command. Here your query sequence will act as a motif so it preserve "Ns"

    seqkit.exe locate --degenerate --ignore-case --pattern-file your_query.fa file_conatin_all_seq.fasta

You can modify the command according to your requirement.

Hoping it may help you.

ADD REPLY • link 4.5 years ago by archana.bioinfo87 ▴ 210