I have a alignment fasta file which looks like this:
$ cat alignment_file.fasta >Ref1 ATCG >S1 AT-C >S2 CTCG >S3 ATCG >S4 -TCG
Specifically, what I want is:
1) If there exists a SNP (for eg: "C" in the S2 sequence in the position 1), then I want the gap ("-", S4 sequence in the position 1) to be replaced with "N"
2) If there is a gap in the column but no SNPs (like in the S1 sequence, position 3), then I want to replace the gap with reference base in the position 3 (which is "C" in the reference).
I want to change the above output to this:
>Ref1 ATCG >S1 ATCC >S2 CTCG >S3 ATCG >S4 NTCG
Is there any easy way or any tool with which I can do this? Thanks so much!