Question: Unknown X in amino acid sequences, creating error in epitope conservation study
0
gravatar for anolidinasha
8 weeks ago by
anolidinasha0 wrote:

I'm doing an epitope prediction and conservation study using SARS CoV 2 sequences. In most of the amino acid sequences eg: spike protein, of SARS CoV 2 sequences, belonging to different geographical regions, unknown X amino acids are there.They cause errors in epitope prediction and epitope conservation study using IEDB epitope conservation tool. What can I do to clear this hindrance because of X?I can't delete X and carry on the analysis because the positions with X interfere with the predicted epitopes.

ADD COMMENTlink written 8 weeks ago by anolidinasha0

Either use a tool that is able to handle missing values (if there is one) or do imputation (use the corresponding nucleotide sequences if you have them).

ADD REPLYlink written 8 weeks ago by Jean-Karim Heriche24k

I used GISAID database to download the nucleotide sequences of SARS CoV 2 and then only translated them into amino acid sequences, using a reference sequence. The unknown X is there because of the unknown N nucleotides in the SARS CoV 2 sequences I have selected. So, is there a way to work with X without having to replace the sequences with new ones

ADD REPLYlink written 8 weeks ago by anolidinasha0
1

You can do missing nucleotide imputation for example with an HMM or a random forest. Or you find another tool to use that can handle missing data or you don't use sequences with missing information.

ADD REPLYlink written 8 weeks ago by Jean-Karim Heriche24k

Thank you very much for your suggestions.

ADD REPLYlink written 8 weeks ago by anolidinasha0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1810 users visited in the last hour
_