Question

Best practices for analysing protein (amino acid) sequences?

0

Entering edit mode

4.7 years ago

vmax • 0

My background is in RNA-seq, but I’m now starting to work with protein (amino acid) sequences. There is a lot of literature for best practices and quality checks throughout RNA-seq analysis (from raw reads to say differential expression analysis or variant calling).

I’m having trouble finding similar literature for working with protein sequences. Things I’m thinking about are 1) If I need to filter redundant sequences, how do different thresholds for sequence similarity affect my results? 2) How do I check the accuracy of a multiple sequence alignment? 3) What about accuracy when I align to a structure? 4) Do I give any special considerations towards gaps (insertions/deletions)? 5) Other things I’m not aware of?

Does anyone have resources that would help answer these questions?

If necessary to know, I’m aligning multiple sequences to a structure, clustering and evaluating point mutations.

alignment sequence • 843 views

ADD COMMENT • link 4.7 years ago by vmax • 0

0

Entering edit mode

Can you clarify what you want to achieve? What do you call a structure? What do you mean by evaluating point mutations?
1- Filtering out proteins with redundant sequences depends on what you want to do. For evaluating amino-acid variability, you may want to keep all sequences but it really depends on what your data set is and what the goal is.
2- For accuracy of MSAs, check papers that compare MSA methods/tools, like those mentioned in this post.
3- What's a structure in this case?
4- Are there reasons in the context you're working in to consider insertions/deletions? Where do your sequences come from?

ADD REPLY • link 4.7 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Jean-Karim Heriche started you in the right direction, but you're asking a lot in a single question. Answers for all of your points could be questions in their own right. You may want to dial-in which one(s) are giving you the most issues so you'll get more attention from high-throughput sequencing folks, people with experience in alignment, and structural biologists when applicable.

ADD REPLY • link 4.7 years ago by Brice Sarver ★ 3.8k