Question: How To Use Psl2Sam.Pl From Samtools?
2
gravatar for Tianyang Li
6.3 years ago by
Tianyang Li470
Beijing, China
Tianyang Li470 wrote:

Hi,

I have a question about how psl2sam.pl from samtools should be used.

This is what the script says

Usage: psl2sam.pl [-a 1] [-b 3] [-q 5] [-r 2] <in.psl>

But there's nothing about what all those options mean, can somebody please provide some more informatoin?

Also, usually the SAM output from bowtie2 has the query sequences, but there are no sequences in BLAT. I'm not sure how to incorporate the query sequences into the output of psl2sam.pl.

blat samtools convert sam • 2.9k views
ADD COMMENTlink written 6.3 years ago by Tianyang Li470
3
gravatar for Niek De Klein
6.3 years ago by
Niek De Klein2.4k
Netherlands
Niek De Klein2.4k wrote:

Edited after Vikas Bansal post explaining $t


It's used for calculating the score. If you look at the source code: http://www.koders.com/perl/fidD58E03F477DD4AE0F310DB84C9E0E58F3B61FAB2.aspx?s=%22sam%22#L18

It has a line:

my $score = $a * $t[0] - $b * $t[1] - $q * $gap_open - $r * $gap_ext;

Because of the nice variable naming and my lack of perl experience I can't be exactly sure, but from the comment:

# This script calculates a score using the BLAST scoring
# system.

and the gap_open and gap_ext variables I would say that it calculates the change from

(number of matches * option a) - (number of mismatches * option b) (this is the mismatch penalty, with no penalty of 0 and 1 are the same) - (gap open * option q) - (gap extension * option r) (this is the gap penalty)

The values given in usage are the default values, so if you do

psl2sam.pl my_psl_file.psl

it will use a = 1, b = 3, q = 5, r = 2. I'm not exactly sure what $t is, but I think that it works like this (compared to the default):

  • if you make option a higher, a mismatch will not lower the score as much (and if you make it lower it will lower the score more)
  • if you make option b higher a mismatch will result in a lower score (and if you make it lower a mismatch will not lower the score as much)
  • if you make option q higher a gap-open will make the score lower (and if you make it lower gap-opens won't affect the score as much)
  • if you make option r higher a gap extension will make the score lower (and if you make it lower a gap extension won't affect the score as much)
ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Niek De Klein2.4k
2
gravatar for Vikas Bansal
6.3 years ago by
Vikas Bansal2.3k
Berlin, Germany
Vikas Bansal2.3k wrote:

I think Niek has covered all the parts of your question but I will answer remaining. @Niek : @t is an array which contains the columns of blat output (psl file). So $t[0] is the first column from psl file which is - number of matches and $t[1] is second column which is number of mismatches.

@Tmy : If you are interested in query sequences, you can modify the script little bit. I think 10th column of blat ouput contains query name. So the script link which Niek has given, you can change line number 34 of code from -

@s[6..10] = ('*', 0, 0, '*', '*');

to

$s[9]=$t[9];
@s[6..8] = ('*', 0, 0);
$s[10]=('*');

Now in final sam output you have query names also, which you can replace with the help of your input fasta file.

P.S: I did not ran the code in my PC but I think it should work.

EDIT : Just realise, you can also use 1st column of sam format (which is query name) and then with the help these names, you can put sequences in 10th column of sam from your input fasta file. So no need of modifying the above script.

ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Vikas Bansal2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1472 users visited in the last hour