Low Complexity region determination using SEG
21 months ago
jaqx008 ▴ 110

Hello all. I have a protein in fasta format and I am trying to make figures to show regions of low complexity with SEG program . Im not sure how to install the SEG program neither do I really understand the command execution. Can anyone assist me with this or provide me with a better alternative? Thanks

21 months ago
Mensur Dlakic ★ 15k

SEG is an old program which I have on my computers, but it took a while to actually trace how I got it. Anyway, it is available for download:

ftp://ftp.ncbi.nlm.nih.gov/pub/seg/seg/

Get all the files into a directory, and from inside of it type make. That will compile a binary called seg, which will tell you the options if you run it without any arguments. It works in two general modes: 1) masking low-complexity region with X characters if you supply the -x switch; 2) showing low-complexity regions on the left and normal complexity regions on the right.

If you have this sequence saved in a file hco.fas:

>HCO
MPEYRSQMLALPFAVVALVLFFVQLVAGGLLALYYLNPDLLSGIANFNLI
RAYHINALILWLFSATFAAVFYLTPILAKRELWGQSLVKLLAVVLVLVVI
GIFATLPLMQSGTNIWIANQPMLVEGKEYVEAGRLWDIFIFIGFIIVAVV
VLKTLPSPKEWPLALWALVIGAAGTFILYIPGNLFFKSVVVSEYFRWWTV
HYWVEGSLEVAYAGAIGLVLMLLIPDPRVKKVVDKYIFYDVILAATSGVI
GQGHHYFWIGTPTFWILLGGVISVLEIVPLALMALESLRIAKELKQPFPN
IPSLYFMVGILIFGFIGVSLLGLIQTWPWTNWWEHGTWVTPSHGHECMMA
FAMGGIALLYLALPDLTGKPIDRTLVLWSKRAFWLMFIGQVILASTFGLA
GTVQIYHYWILAEPWQKVLEARFPFVPGIVFGGAMVFLGYLHLAASMFRH
LLMPVEGEEYKPAAVKKSFLTTFDHFPFLVVLAVFFALIGTTG


Running ./seg hco.fas will give you this output:

>HCO
1-8    MPEYRSQM
lalpfavvalvlffvqlvaggllalyylnp    9-41
dll
42-86   SGIANFNLIRAYHINALILWLFSATFAAVF
YLTPILAKRELWGQS
lvkllavvlvlvv   87-99
100-137  IGIFATLPLMQSGTNIWIANQPMLVEGKEY
VEAGRLWD
ififigfiivavvv  138-151
152-477  LKTLPSPKEWPLALWALVIGAAGTFILYIP
GNLFFKSVVVSEYFRWWTVHYWVEGSLEVA
YAGAIGLVLMLLIPDPRVKKVVDKYIFYDV
ILAATSGVIGQGHHYFWIGTPTFWILLGGV
ISVLEIVPLALMALESLRIAKELKQPFPNI
PSLYFMVGILIFGFIGVSLLGLIQTWPWTN
WWEHGTWVTPSHGHECMMAFAMGGIALLYL
ALPDLTGKPIDRTLVLWSKRAFWLMFIGQV
ILASTFGLAGTVQIYHYWILAEPWQKVLEA
RFPFVPGIVFGGAMVFLGYLHLAASMFRHL
LMPVEGEEYKPAAVKKSFLTTFDHFP
flvvlavffal  478-488
489-493  IGTTG


Regions in lower-case letters on the left are low-complexity parts. If you run it ./seg hco.fas -x:

>HCO
MPEYRSQMxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxSGIANFNLIRAYHINALIL
WLFSATFAAVFYLTPILAKRELWGQSxxxxxxxxxxxxxIGIFATLPLMQSGTNIWIANQ
PMLVEGKEYVEAGRLWDxxxxxxxxxxxxxxLKTLPSPKEWPLALWALVIGAAGTFILYI
PGNLFFKSVVVSEYFRWWTVHYWVEGSLEVAYAGAIGLVLMLLIPDPRVKKVVDKYIFYD
VILAATSGVIGQGHHYFWIGTPTFWILLGGVISVLEIVPLALMALESLRIAKELKQPFPN
IPSLYFMVGILIFGFIGVSLLGLIQTWPWTNWWEHGTWVTPSHGHECMMAFAMGGIALLY
LALPDLTGKPIDRTLVLWSKRAFWLMFIGQVILASTFGLAGTVQIYHYWILAEPWQKVLE
ARFPFVPGIVFGGAMVFLGYLHLAASMFRHLLMPVEGEEYKPAAVKKSFLTTFDHFPxxx
xxxxxxxxIGTTG


Install the program somewhere in \$PATH where the system can see it, and you will be able to run the command in any directory and without ./ next to seg.

I think seg now comes with the ncbi blast package.

SEG was always a part of BLAST, but in older version the masking was done internally and there was no way to print out a summary like what I have shown above. Will more recent versions of BLAST do that?

Thanks for clarifying. Recent blast versions do not give a summary like that as far as i know.

Thanks for your response. I did ran into the following error while running make in the folder. part of the error is shown below.

genwin.c:793:8: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
extern upper(string, len)
~~~~~~ ^
genwin.c:806:8: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
extern lower(string, len)
~~~~~~ ^
genwin.c:861:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
tfree(ptr)
^
34 warnings and 11 errors generated.
make: *** [makefile:11: genwin.o] Error 1

Thank you. I have been able to execute the command using segmasker. again, thanks.

