I have a tab-limited text file which has the IDs in column number 1 and the corresponding HMM name in column number 7 as shown below.
gi|336321007|ref|YP_004600975.1| adh_short_C2 gi|336321007|ref|YP_004600975.1| adh_short gi|336321007|ref|YP_004600975.1| KR gi|557685240|ref|YP_008788710.1| PS-DH gi|557685240|ref|YP_008788710.1| adh_short_C2 gi|557685240|ref|YP_008788710.1| adh_short gi|557685240|ref|YP_008788710.1| KR gi|557685240|ref|YP_008788710.1| ketoacyl-synt gi|557685240|ref|YP_008788710.1| Ketoacyl-synt_C . . . .
I want to select all the rows having 'adh_short_C2' or 'adh_short' or 'KR' for every unique sequence ID in column 1. Ex. gi|336321007|ref|YP_004600975.1| in this case.
And delete all the rows which have other HMM names in addition to 'adh_short_C2' or 'adh_short' or 'KR' for every single ID. Ex. gi|557685240|ref|YP_008788710.1| in this case.
Desired output - rows containing the IDs which have only 'adh_short_C2' or 'adh_short' or 'KR' and no other HMM names.
I tried this code but it doesn't work well as it also picks up the IDs having other HMM names as well
adh_short_C2_list <- subset(adh_short_C2, select=`seq id`) adh_short_list <- subset(adh_short, select=`seq id`)
How to execute these two conditions together or step-by-step?