Aligning short oligos containing Ns to a genome
0
0
Entering edit mode
4.5 years ago
dacotahm ▴ 20

Hello, I want to align short sequences that contain Ns to a genome, but I want to preserve the number of Ns and not allow gaps or mismatches. For example -

CTCGAGNNNNNNATGTGG

I know matches exist -

$ grep 'CTCGAG......ATGTGG' OlGenomev3.fasta
TAACGATGGCAAAAGGAAAG**CTCGAGCCAAGCATGTGG**ACGATATATAATACGATCAAGG
TATTGATGTTACGAATGCGTGATAAATTAACAAATAATTC**CTCGAGCGTAAAATGTGG**GT

But I have not been able to recover them as an alignment with blast or bwa. The above example excludes any matches lost by a line wrap. What is the best way to do this with an alignment program?

bwa aln -e 1 -e 6 -t 4 -M 1 -O 1 -E 1 Oligv3BWA ./BrookeGenes/BrookeSeqs.fasta > test3.bwa

and

blastn -query Seqs.fasta -db ../OlGenomev3.fasta -task 'blastn-short' -max_target_seqs 5 -word_size 4 -outfmt "6 qseqid sseqid qlen length qseq sseq pident sstrand" -gapopen 10 -penalty -1 -out SeqsXOlv3.csv

Return no matches.

Thanks-

alignment blast bwa • 778 views
ADD COMMENT
0
Entering edit mode

Not sure about BLAST but most NGS aligners consider ambiguous (non-ATCG) characters as mismatches. In you example 6/18 so 33% are mismatches and centered in the read. I do not see how this is ever going to be a valid alignment at that short query length.

ADD REPLY
0
Entering edit mode

Hi, If your file is not too big than you can try this command. Here your query sequence will act as a motif so it preserve "Ns"

    seqkit.exe locate --degenerate --ignore-case --pattern-file your_query.fa file_conatin_all_seq.fasta

You can modify the command according to your requirement.

Hoping it may help you.

ADD REPLY

Login before adding your answer.

Traffic: 1462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6