Remove ambiguous bases from alignment sequence

0

Entering edit mode

6.8 years ago

a.moner • 0

Hi, could you please help me with this I have very big alignment file and there are many ambiguous bases,

1- I want to replace them (Y, W, S, R, M and K) with ( _ )

2- remove the entire column that includes this ambiguity and keep just the four bases A, T, C, G and remove all gaps

thanks here an example

S1 CCGCCGCCGCCTCC

S2 CCGCCGCCGWCTCC

S3 CCGCCMCCGCCTCC

S4 CCGCCTCCGCCTCC

I want the output for the first question like this

S1 CCGCCGCCGCCTCC

S2 CCGCCGCCG_CTCC

S3 CCGCC_CCGCCTCC

S4 CCGCCTCCGCCTCC

I want the output for the second question like this

S1 CCGCCCCGCTCC

S2 CCGCCCCGCTCC

S3 CCGCCCCGCTCC

S4 CCGCCCCGCTCC

genome next-gen sequencing alignment sequence • 2.6k views

ADD COMMENT • link updated 6.7 years ago by Biostar 20 • written 6.8 years ago by a.moner • 0

2

Entering edit mode

t_coffee provides very good alignment reformatting options here. For example to change all A to 1 and T to a gap

t_coffee -other_pg seq_reformat -in=input.aln -output=clustalw_aln -out=output.aln -action  +convert 'A1' 'T-'

The command for removing gapped columns is here

t_coffee -other_pg seq_reformat -in=a.aln -output=clustalw_aln -out=test.aln -action  +convert +rm_gap  n

n after rm_gap has to be set accordingly.

ADD REPLY • link 6.8 years ago by microfuge ★ 1.9k

Login before adding your answer.