Question: Aligning short oligos containing Ns to a genome
0
gravatar for dacotahm
11 months ago by
dacotahm20
United States
dacotahm20 wrote:

Hello, I want to align short sequences that contain Ns to a genome, but I want to preserve the number of Ns and not allow gaps or mismatches. For example -

CTCGAGNNNNNNATGTGG

I know matches exist -

$ grep 'CTCGAG......ATGTGG' OlGenomev3.fasta
TAACGATGGCAAAAGGAAAG**CTCGAGCCAAGCATGTGG**ACGATATATAATACGATCAAGG
TATTGATGTTACGAATGCGTGATAAATTAACAAATAATTC**CTCGAGCGTAAAATGTGG**GT

But I have not been able to recover them as an alignment with blast or bwa. The above example excludes any matches lost by a line wrap. What is the best way to do this with an alignment program?

bwa aln -e 1 -e 6 -t 4 -M 1 -O 1 -E 1 Oligv3BWA ./BrookeGenes/BrookeSeqs.fasta > test3.bwa

and

blastn -query Seqs.fasta -db ../OlGenomev3.fasta -task 'blastn-short' -max_target_seqs 5 -word_size 4 -outfmt "6 qseqid sseqid qlen length qseq sseq pident sstrand" -gapopen 10 -penalty -1 -out SeqsXOlv3.csv

Return no matches.

Thanks-

bwa blast alignment • 231 views
ADD COMMENTlink written 11 months ago by dacotahm20

Not sure about BLAST but most NGS aligners consider ambiguous (non-ATCG) characters as mismatches. In you example 6/18 so 33% are mismatches and centered in the read. I do not see how this is ever going to be a valid alignment at that short query length.

ADD REPLYlink written 11 months ago by ATpoint38k

Hi, If your file is not too big than you can try this command. Here your query sequence will act as a motif so it preserve "Ns"

    seqkit.exe locate --degenerate --ignore-case --pattern-file your_query.fa file_conatin_all_seq.fasta

You can modify the command according to your requirement.

Hoping it may help you.

ADD REPLYlink written 11 months ago by archana.bioinfo87180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour