Hello
Please i have question i have contigs file tha i want to annotation using prokka but i get this error msg saying that contains duplicate sequence ID: scaffold36|size13034 it makes sense because i merge some assembly files and i eliminate duplication using cd-hit and seqkit and i think that they didn't the work perfectly
so what i need is eliminate duplication sequences 'manualy' (or using another software )
so basically whta i want to do is
i have file like this :
>scaffold1|size1334
ACTGATGATACAGATACAGAAAGTAGAGATCGATGATAGA..
>scaffold2|size23034
ACAGATGAGACAGATTGACAGATAGAGATAGAGGATAGGACAG..
>scaffold3|size11654
ATAGCGCTCGCGCGCCGCGCGGCGGGGTAGAGAGATCTTTTGAGAGAGA..
>scaffold4|size3034
TGGGGTAGAGAGAGAGAGAGAAGAGGAAGAGAGGAGAGAGGA..
>scaffold2|size23034
ACAGATGAGACAGATTGACAGATAGAGATAGAGGATAGGACAG..
>scaffold100|size304
AAAAAAATACAGATAGAGAGAGAGAGGAGAGAGAGAG..
>scaffold67|size2400
ATAGAGAGAGAGAGAGAGAGAGAGAGAGGAGAGAGAGAGA..
i want to eliminate the duplicated scaffold (in this case is scaffold 2 the line >scaffold2|size2304 and its sequence because is repeated two times
so the out put will be
>scaffold1|size1334
ACTGATGATACAGATACAGAAAGTAGAGATCGATGATAGA..
>scaffold2|size23034
ACAGATGAGACAGATTGACAGATAGAGATAGAGGATAGGACAG..
>scaffold3|size11654
ATAGCGCTCGCGCGCCGCGCGGCGGGGTAGAGAGATCTTTTGAGAGAGA..
>scaffold4|size3034
TGGGGTAGAGAGAGAGAGAGAAGAGGAAGAGAGGAGAGAGGA..
>scaffold100|size304
AAAAAAATACAGATAGAGAGAGAGAGGAGAGAGAGAG..
>scaffold67|size2400
ATAGAGAGAGAGAGAGAGAGAGAGAGAGGAGAGAGAGAGA.
.
each scaffold is repeated just one time Thank you