Unknown X in amino acid sequences, creating error in epitope conservation study
0
0
Entering edit mode
3.4 years ago

I'm doing an epitope prediction and conservation study using SARS CoV 2 sequences. In most of the amino acid sequences eg: spike protein, of SARS CoV 2 sequences, belonging to different geographical regions, unknown X amino acids are there.They cause errors in epitope prediction and epitope conservation study using IEDB epitope conservation tool. What can I do to clear this hindrance because of X?I can't delete X and carry on the analysis because the positions with X interfere with the predicted epitopes.

epitope prediction epitope conservation sequence • 788 views
ADD COMMENT
0
Entering edit mode

Either use a tool that is able to handle missing values (if there is one) or do imputation (use the corresponding nucleotide sequences if you have them).

ADD REPLY
0
Entering edit mode

I used GISAID database to download the nucleotide sequences of SARS CoV 2 and then only translated them into amino acid sequences, using a reference sequence. The unknown X is there because of the unknown N nucleotides in the SARS CoV 2 sequences I have selected. So, is there a way to work with X without having to replace the sequences with new ones

ADD REPLY
1
Entering edit mode

You can do missing nucleotide imputation for example with an HMM or a random forest. Or you find another tool to use that can handle missing data or you don't use sequences with missing information.

ADD REPLY
0
Entering edit mode

Thank you very much for your suggestions.

ADD REPLY

Login before adding your answer.

Traffic: 3085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6