Find Mismatch And Gap Positions
1
3
Entering edit mode
7.3 years ago
biolab ★ 1.3k

Hi everyone,

I want to find the homologs of many short sequences, and then identify the mismatch and gap positions for each pair. The first step can be easily done by BLAST, FASTA or patscan, but the second step is troublesome for me.

Could you please suggest me the tools and methods that can be used to complete this task?

Thank you in advance!

blast fasta • 4.7k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
6
Entering edit mode
7.3 years ago

I wrote something like this a few weeks ago: http://lindenb.github.io/jvarkit/BlastNToSnp.html

The program reads a BLASTN-XML file/stream , walk over the alignments and print the variations.

java -jar dist/blastn2snp.jar  blastn.xml | head
#query    hit    hit-index    hsp-index    query-POS    hit-POS    STRAND    REF(hit)    ALT(query)    blast.align_length    blast.hit.var    blast.query.var    blast.mid.var
No definition line    Homo sapiens chromosome 6, alternate assembly CHM1_1.1    1    9    21    74567818    -    T    A    18    A    T    .
No definition line    Homo sapiens chromosome 6, alternate assembly HuRef    2    9    21    71600901    -    T    A    18    A    T    .
No definition line    Homo sapiens chromosome 6, GRCh37.p13 Primary Assembly    3    9    21    74401398    -    T    A    18    A    T    .
No definition line    Homo sapiens chromosome 5, alternate assembly CHM1_1.1    4    1    7    107821121    -    A    G    28    T    C    .
No definition line    Homo sapiens chromosome 5, alternate assembly CHM1_1.1    4    9    16    14262358    +    G    C    18    G    C    .
No definition line    Homo sapiens chromosome 5, alternate assembly CHM1_1.1    4    13    8    132662461    -    T    C    18    A    G    .
No definition line    Homo sapiens chromosome 5, alternate assembly CHM1_1.1    4    20    14    170329095    -    G    C    18    C    G    .
No definition line    Homo sapiens chromosome 5, alternate assembly HuRef    5    1    7    103561224    -    A    G    28    T    C    .
No definition line    Homo sapiens chromosome 5, alternate assembly HuRef    5    9    16    14234054    +    G    C    18    G    C    .
ADD COMMENT
1
Entering edit mode

Thank you, Pierre, the java program is cool.

ADD REPLY
1
Entering edit mode

Hi Pierre, can you give me some details on how to compile this file?

I have downloaded and installed jvarkit following instructions here: https://github.com/lindenb/jvarkit/wiki/Compilation

How can I compile and use this file:

my_local_dir/jvarkit/src/main/java/com/github/lindenb/jvarkit/tools/blast/BlastNToSnp.java

ADD REPLY
1
Entering edit mode

Dear Pierre

I need blastn2snp but i don't know how can i compile it. if possible explain more about that. Thanks

ADD REPLY
0
Entering edit mode

Thank you Pierre!

Do you have a explanation/description of each column of the blastn2snp output? I want to understand something like:

REF(hit) ALT(query) blast.align_length blast.hit.var blast.query.var blast.mid.var

CT TG 35 AG CA ..

GTC T 37 GAC A-- ...

"\t" T 26 - T .

ADD REPLY
1
Entering edit mode

REF : reference sequence ALT: alternate sequence blast.align_length : length of difference blast.hit.var blast.query.var blast.mid.var : the blast lines (query/hit and difference line) seen in the blast output

ADD REPLY

Login before adding your answer.

Traffic: 2580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6