Hi all, I am looking for a way that allows me to filter out (remove) some sequences from a fasta file (with aligned protein sequences) based on a list of headers given as separate input file. I am aware that there are similar questions here but in most cases, the given answers show how to get and not how to remove the sequences. I also tried the suggested software (e.g., BBMap) but this was either destroying my alignment or/and replacing ā-ā by āNā, which is particularly bad for protein alignments. Any idea how to do this? A bash, python, or perl script would be great. Many thanks in advance!
### fasta file ###
>Species_X
PFEAIQIINLPHRYGANTFKLHRLPVPRPGQVLGLVGTNGIGKSTALKILAGKLKPNLGR
FTSPPDWQEILTHFRGSELQNYFTRILEDNLKAIIKPQYVDHIPLSGGELQRFAIAVVAI
QNAEIYMFDEPSSYLDVKQRLKAAQVVRSYVIVVEHDLSVLDYLSDFICCLYGKPGAYGV
VTLPFSVREGINIFLAGFVPTENLRFRDESLTFKGEFTDSQIIVMLGENGTGKTTFIRML
AGLLNVSYKPQKISPKFQNSVRHLLHQKIRDSYMHPQFMSDVMKPLQIEQLMDQEVVNLS
GGELQRVALTLCLGKPADIYLIDEPSAYLDSEQRIVASKVIKRFILHAKKTAFVVEHDFI
MATYLADRVIVYEGQPSIDCTANCPQSLLSGMNLFLSHLNITFRRDPTNFRPRINKLEST
KDREQKSAGSYY
>Species_Y
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
--------------------------------------------MLGENGTGKTTFIRML
AG--NVSYKPQ--------TVRQLLHDKIRDAYTHPQFVSDVIRPLQIEQLLDQVVKTLS
GGEKQRVAITLCLGKPADIYLIDEPSAHLDSEQRITASKVIKRFILHAKKTAFIVEHDFI
MATYLADRVIVYEGQPAVKCIAHSPQSLLSGMNLFLSHLNITFRRDPTNFRPRINKLESI
KDKEQKTAGSYY
>Species_Z
PFGAIHIINLPHRYSANSFKLHRLPMPRPGQVLGLVGTNGIGKSTALKILSGKLKPNLGR
FDNPPDWEEILKYFRGSELQNYFTKVLEDDLKAVVKPQYVDQIPLSGGELQRFAIGLVCV
QKADVYMFDEPSSYLDVKQRLAAARSIREYVIVVEHDLSVLDYLSDFVCVLYGRPALYGV
VTLPASVREGINIFLDGHIPTENLRFREESLTFRGSFTDSEIIVMMGENGTGKTTFCKML
AGAENISMKPQKITPKFQGTVRQLFFKRIKAAFLSPQFQTDVYKPLKIDDFIDQEVQNLS
GGELQRVAIVLALGIPADIYLIDEPSAYLDSEQRIVASRVIKRFIMHTKKTAFIVEHDFI
MATYLADRVIVFDGQPSVDAHANAPESLVTGCNTFLKNLDVTFRRDPNSYRPRINKYQSQ
MDQEQKLAGNY-
### to_remove.txt ###
Species_X
Species_Z
### desired output ###
>Species_Y
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
--------------------------------------------MLGENGTGKTTFIRML
AG--NVSYKPQ--------TVRQLLHDKIRDAYTHPQFVSDVIRPLQIEQLLDQVVKTLS
GGEKQRVAITLCLGKPADIYLIDEPSAHLDSEQRITASKVIKRFILHAKKTAFIVEHDFI
MATYLADRVIVYEGQPAVKCIAHSPQSLLSGMNLFLSHLNITFRRDPTNFRPRINKLESI
KDKEQKTAGSYY
Did you search yourself already for a solution on this site? This is asked a zillion times already and answered as well.