Question: Best practices for analysing protein (amino acid) sequences?
gravatar for vmax
24 days ago by
vmax0 wrote:

My background is in RNA-seq, but I’m now starting to work with protein (amino acid) sequences. There is a lot of literature for best practices and quality checks throughout RNA-seq analysis (from raw reads to say differential expression analysis or variant calling).

I’m having trouble finding similar literature for working with protein sequences. Things I’m thinking about are 1) If I need to filter redundant sequences, how do different thresholds for sequence similarity affect my results? 2) How do I check the accuracy of a multiple sequence alignment? 3) What about accuracy when I align to a structure? 4) Do I give any special considerations towards gaps (insertions/deletions)? 5) Other things I’m not aware of?

Does anyone have resources that would help answer these questions?

If necessary to know, I’m aligning multiple sequences to a structure, clustering and evaluating point mutations.

sequence alignment • 92 views
ADD COMMENTlink written 24 days ago by vmax0

Can you clarify what you want to achieve? What do you call a structure? What do you mean by evaluating point mutations?
1- Filtering out proteins with redundant sequences depends on what you want to do. For evaluating amino-acid variability, you may want to keep all sequences but it really depends on what your data set is and what the goal is.
2- For accuracy of MSAs, check papers that compare MSA methods/tools, like those mentioned in this post.
3- What's a structure in this case?
4- Are there reasons in the context you're working in to consider insertions/deletions? Where do your sequences come from?

ADD REPLYlink modified 24 days ago • written 24 days ago by Jean-Karim Heriche20k

Jean-Karim Heriche started you in the right direction, but you're asking a lot in a single question. Answers for all of your points could be questions in their own right. You may want to dial-in which one(s) are giving you the most issues so you'll get more attention from high-throughput sequencing folks, people with experience in alignment, and structural biologists when applicable.

ADD REPLYlink written 23 days ago by Brice Sarver2.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1963 users visited in the last hour