Hello All,

I set up BLAST Search program to making auto typing program.

I have one question "How to bring original sequences" from output file.

For example,

/shared/MiSeq/BLAST/ncbi-blast-2.2.29+/bin/blastn -num_alignments 500 -word_size 50 -db ./test -query ./test.fasta -out ./test.out -outfmt 5

I got output file format as "XML" because of outfmt 5.

I convert this results and programming handling this data.

But I wonder is there any way to get "Original Sequence" in the output format? Or, is there any idea how to handle below cases?

I have one raw sequences like below,


that "TTTTTTTTTTTTT" positions are Exon2.

So I have a reference like below,


So If I aligned this two It will be like below


BLAST Search results will be 100% match like below,


In this case I want to determine whether this raw sequences amplified 100% of that database segment.

This case even though there are 2 mismatch bases are trimmed It is actually fully amplified Exon2.

but Some cases like below does not 100% amplified.


So below examples are

  Intron 1                          Exon2                 Intron 2

It is only Exon2 is fully amplified. I'd like to get this fully amplified sequences (Which segments are fully amplified)

I am thinking to get it from comparing original sequences. Does anyone has good idea? Thank you,

Are those repetitive nucleotides the actual sequence or just an inconvenient representation of your data?

ADD REPLYlink written 19 days ago by WouterDeCoster12k

@WouterDeCoster That is not actual sequence, It is just for example.

ADD REPLYlink written 19 days ago by clear.choi30

But in the first part exon2 is TTT (rep) and in the second part AAA (rep)? I'm afraid I don't understand your question and I'm not sure blast is the appropriate tool for your analysis, what do you really aim to achieve?

ADD REPLYlink written 19 days ago by WouterDeCoster12k

I want to make sure I can distinguish It is fully amplified based on Reference sequence using Blast output without checking original sequence and raw sequence again. If It is difficult to get that information from BLAST output, at least I need to get original sequences. then I can make some calculation algorithm based on that original sequence.

ADD REPLYlink written 19 days ago by clear.choi30
