Unknown X in amino acid sequences, creating error in epitope conservation study

0

Entering edit mode

3.4 years ago

anolidinasha • 0

I'm doing an epitope prediction and conservation study using SARS CoV 2 sequences. In most of the amino acid sequences eg: spike protein, of SARS CoV 2 sequences, belonging to different geographical regions, unknown X amino acids are there.They cause errors in epitope prediction and epitope conservation study using IEDB epitope conservation tool. What can I do to clear this hindrance because of X?I can't delete X and carry on the analysis because the positions with X interfere with the predicted epitopes.

epitope prediction epitope conservation sequence • 788 views

ADD COMMENT • link 3.4 years ago by anolidinasha • 0

0

Entering edit mode

Either use a tool that is able to handle missing values (if there is one) or do imputation (use the corresponding nucleotide sequences if you have them).

ADD REPLY • link 3.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I used GISAID database to download the nucleotide sequences of SARS CoV 2 and then only translated them into amino acid sequences, using a reference sequence. The unknown X is there because of the unknown N nucleotides in the SARS CoV 2 sequences I have selected. So, is there a way to work with X without having to replace the sequences with new ones

ADD REPLY • link 3.4 years ago by anolidinasha • 0

1

Entering edit mode

You can do missing nucleotide imputation for example with an HMM or a random forest. Or you find another tool to use that can handle missing data or you don't use sequences with missing information.