Question: Aligning short oligos containing Ns to a genome
0
gravatar for dacotahm
4 weeks ago by
dacotahm20
United States
dacotahm20 wrote:

Hello, I want to align short sequences that contain Ns to a genome, but I want to preserve the number of Ns and not allow gaps or mismatches. For example -

CTCGAGNNNNNNATGTGG

I know matches exist -

$ grep 'CTCGAG......ATGTGG' OlGenomev3.fasta
TAACGATGGCAAAAGGAAAG**CTCGAGCCAAGCATGTGG**ACGATATATAATACGATCAAGG
TATTGATGTTACGAATGCGTGATAAATTAACAAATAATTC**CTCGAGCGTAAAATGTGG**GT

But I have not been able to recover them as an alignment with blast or bwa. The above example excludes any matches lost by a line wrap. What is the best way to do this with an alignment program?

bwa aln -e 1 -e 6 -t 4 -M 1 -O 1 -E 1 Oligv3BWA ./BrookeGenes/BrookeSeqs.fasta > test3.bwa

and

blastn -query Seqs.fasta -db ../OlGenomev3.fasta -task 'blastn-short' -max_target_seqs 5 -word_size 4 -outfmt "6 qseqid sseqid qlen length qseq sseq pident sstrand" -gapopen 10 -penalty -1 -out SeqsXOlv3.csv

Return no matches.

Thanks-

bwa blast alignment • 103 views
ADD COMMENTlink written 4 weeks ago by dacotahm20

Not sure about BLAST but most NGS aligners consider ambiguous (non-ATCG) characters as mismatches. In you example 6/18 so 33% are mismatches and centered in the read. I do not see how this is ever going to be a valid alignment at that short query length.

ADD REPLYlink written 4 weeks ago by ATpoint26k

Hi, If your file is not too big than you can try this command. Here your query sequence will act as a motif so it preserve "Ns"

    seqkit.exe locate --degenerate --ignore-case --pattern-file your_query.fa file_conatin_all_seq.fasta

You can modify the command according to your requirement.

Hoping it may help you.

ADD REPLYlink written 4 weeks ago by archana.bioinfo87180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1172 users visited in the last hour