How To Use Psl2Sam.Pl From Samtools?
2
2
Entering edit mode
9.5 years ago
Tianyang Li ▴ 490

Hi,

I have a question about how psl2sam.pl from samtools should be used.

This is what the script says

Usage: psl2sam.pl [-a 1] [-b 3] [-q 5] [-r 2] <in.psl>


Also, usually the SAM output from bowtie2 has the query sequences, but there are no sequences in BLAT. I'm not sure how to incorporate the query sequences into the output of psl2sam.pl.

blat samtools sam convert • 4.2k views
3
Entering edit mode
9.4 years ago
Niek De Klein ★ 2.6k

Edited after Vikas Bansal post explaining $t It's used for calculating the score. If you look at the source code: http://www.koders.com/perl/fidD58E03F477DD4AE0F310DB84C9E0E58F3B61FAB2.aspx?s=%22sam%22#L18 It has a line: my$score = $a *$t[0] - $b *$t[1] - $q *$gap_open - $r *$gap_ext;


Because of the nice variable naming and my lack of perl experience I can't be exactly sure, but from the comment:

# This script calculates a score using the BLAST scoring
# system.


and the gap_open and gap_ext variables I would say that it calculates the change from

(number of matches * option a) - (number of mismatches * option b) (this is the mismatch penalty, with no penalty of 0 and 1 are the same) - (gap open * option q) - (gap extension * option r) (this is the gap penalty)

The values given in usage are the default values, so if you do

psl2sam.pl my_psl_file.psl


it will use a = 1, b = 3, q = 5, r = 2. I'm not exactly sure what $t is, but I think that it works like this (compared to the default): • if you make option a higher, a mismatch will not lower the score as much (and if you make it lower it will lower the score more) • if you make option b higher a mismatch will result in a lower score (and if you make it lower a mismatch will not lower the score as much) • if you make option q higher a gap-open will make the score lower (and if you make it lower gap-opens won't affect the score as much) • if you make option r higher a gap extension will make the score lower (and if you make it lower a gap extension won't affect the score as much) ADD COMMENT 2 Entering edit mode 9.4 years ago Vikas Bansal ★ 2.4k I think Niek has covered all the parts of your question but I will answer remaining. @Niek : @t is an array which contains the columns of blat output (psl file). So$t[0] is the first column from psl file which is - number of matches and $t[1] is second column which is number of mismatches. @Tmy : If you are interested in query sequences, you can modify the script little bit. I think 10th column of blat ouput contains query name. So the script link which Niek has given, you can change line number 34 of code from - @s[6..10] = ('*', 0, 0, '*', '*');  to $s[9]=$t[9]; @s[6..8] = ('*', 0, 0);$s[10]=('*');


Now in final sam output you have query names also, which you can replace with the help of your input fasta file.

P.S: I did not ran the code in my PC but I think it should work.

EDIT : Just realise, you can also use 1st column of sam format (which is query name) and then with the help these names, you can put sequences in 10th column of sam from your input fasta file. So no need of modifying the above script.