Question: Turning pairwise alignment description in XML/JSON to a multiline string
1
gravatar for Yotam ConstantiniBiostar
8 weeks ago by

Hey all, I'm getting blastp results in a JSON, and want to turn them into a readable format to display on a website

to formulate this as a general question about pairwise alignment, I'm looking for a function (don't matter which programming language) that takes in the following arguments: Positions of the alignment on sequence_a (QUERY_FROM, QUERY_TO) Positions of the alignment on sequence_b (HIT_FROM, HIT_TO) The alignment length (ALIGN_LEN), sequence_a, sequence_b (QSEQ,HSEQ) Plus the middle line (MIDLINE)

and returns a long string with line breaks that is a readable representation of the alignment with positions on the sequences.

Sample input:

{
"query_from": 4,
"query_to": 96,
"hit_from": 1,
"hit_to": 99,
"align_len": 100,
"qseq": "MSDYSTMSSGYCSLEVELEDCFFTAK----RNLQSKQPTKNLCKAVEETWHPPTIQEIKQKIDSY---EKFCLGMKLSEDGYYTGFIKVVGLKLRRPVTV",
"hseq": "MTVDSSMSSGYCSLDEELEDCFFTAKTTFFRNLQSKQPSKNVCKAVEETQHPPTIQEIKQKIDSYNSREKHCLGMKLSEDGTYTGFIK-VHLKLRRPVTV",
"midline": "M+  S+MSSGYCSL+ ELEDCFFTAK    RNLQSKQP+KN+CKAVEET HPPTIQEIKQKIDSY   EK CLGMKLSEDG YTGFIK V LKLRRPVTV"
}

Requested output:

Query  4   MSDYSTMSSGYCSLEVELEDCFFTAK----RNLQSKQPTKNLCKAVEETWHPPTIQEIKQ  59
           M+  S+MSSGYCSL+ ELEDCFFTAK    RNLQSKQP+KN+CKAVEET HPPTIQEIKQ
Sbjct  1   MTVDSSMSSGYCSLDEELEDCFFTAKTTFFRNLQSKQPSKNVCKAVEETQHPPTIQEIKQ  60

Query  60  KIDSY---EKFCLGMKLSEDGYYTGFIKVVGLKLRRPVTV  96
           KIDSY   EK CLGMKLSEDG YTGFIK V LKLRRPVTV
Sbjct  61  KIDSYNSREKHCLGMKLSEDGTYTGFIK-VHLKLRRPVTV  99

Thanks!

blast alignment • 83 views
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Yotam ConstantiniBiostar0

You can use blast_formatter program included in blast+ package, if you save the output in -outfmt 11 first.

$ blast_formatter -h
USAGE
  blast_formatter [-h] [-help] [-rid BLAST_RID] [-archive ArchiveFile]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-line_length line_length] [-html]
    [-sorthits sort_hits] [-sorthsps sort_hsps]
    [-max_target_seqs num_sequences] [-out output_file] [-parse_deflines]
    [-version]

Take a look at biopython blast parser as well.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by genomax84k

Thank you for the quick reply. I need a solution for the input I've written, as I can only use the JSON data I'm getting. I'm looking for some actual source code I can rewrite into my application

ADD REPLYlink written 8 weeks ago by Yotam ConstantiniBiostar0
0
gravatar for Yotam ConstantiniBiostar
8 weeks ago by

I ended up just writing it in PL/SQL, if you ever need to translate to another programming language, here it is:

function pretty_pairwise_alignment(
    ALIGN_LEN integer,
    query_from INTEGER,
    query_to INTEGER,
    hit_from INTEGER,
    hit_to INTEGER,
    qseq CLOB,
    hseq CLOB,
    midline  CLOB
) return clob is
    pretty_output clob;
    line_break varchar2(6) := chr(13)||chr(10);
    padding_constant integer := 4;
    alignment_consumed integer := 0;
    amount integer := 60;
    query_index integer := query_from;
    query_end integer;
    hit_index integer := hit_from;
    hit_end integer;
    sub_query varchar2(60);
    sub_hit varchar2(60);
    sub_midline varchar2(60);
begin
    dbms_lob.createtemporary(pretty_output,false);
    -- determin the amount of charachters in the stringified positions
    padding_constant := FLOOR(LOG(10,greatest(query_to,hit_to))) + 3;


    while (ALIGN_LEN > alignment_consumed) loop
        -- if this is the last line, fix the amount of charachter in this line
        if (ALIGN_LEN - alignment_consumed <= 60) then amount := ALIGN_LEN - alignment_consumed; end if;

        sub_query := DBMS_LOB.SUBSTR(qseq, amount, alignment_consumed+1);
        sub_midline := DBMS_LOB.SUBSTR(midline, amount, alignment_consumed+1);
        sub_hit := DBMS_LOB.SUBSTR(hseq, amount, alignment_consumed+1);
        query_end := query_index + amount - REGEXP_COUNT(sub_query,'-') - 1;
        hit_end := hit_index + amount - REGEXP_COUNT(sub_hit,'-') - 1;

        DBMS_LOB.APPEND(pretty_output,
            'Query  '||RPAD(TO_CHAR(query_index,'fm999999'),padding_constant)||
            sub_query||'  '||query_end||line_break||
            RPAD(' ',7+padding_constant)||sub_midline||line_break||
            'Sbjct  '||RPAD(TO_CHAR(hit_index,'fm999999'),padding_constant)||
            sub_hit||'  '||hit_end||line_break||line_break
        );

        query_index := query_end + 1;
        hit_index := hit_end + 1;
        alignment_consumed := alignment_consumed + amount;
    end loop;

    return pretty_output;
end pretty_pairwise_alignment;
ADD COMMENTlink written 8 weeks ago by Yotam ConstantiniBiostar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1820 users visited in the last hour