Question: Best practices for analysing protein (amino acid) sequences?
gravatar for vmax
15 months ago by
vmax0 wrote:

My background is in RNA-seq, but I’m now starting to work with protein (amino acid) sequences. There is a lot of literature for best practices and quality checks throughout RNA-seq analysis (from raw reads to say differential expression analysis or variant calling).

I’m having trouble finding similar literature for working with protein sequences. Things I’m thinking about are 1) If I need to filter redundant sequences, how do different thresholds for sequence similarity affect my results? 2) How do I check the accuracy of a multiple sequence alignment? 3) What about accuracy when I align to a structure? 4) Do I give any special considerations towards gaps (insertions/deletions)? 5) Other things I’m not aware of?

Does anyone have resources that would help answer these questions?

If necessary to know, I’m aligning multiple sequences to a structure, clustering and evaluating point mutations.

sequence alignment • 264 views
ADD COMMENTlink written 15 months ago by vmax0

Can you clarify what you want to achieve? What do you call a structure? What do you mean by evaluating point mutations?
1- Filtering out proteins with redundant sequences depends on what you want to do. For evaluating amino-acid variability, you may want to keep all sequences but it really depends on what your data set is and what the goal is.
2- For accuracy of MSAs, check papers that compare MSA methods/tools, like those mentioned in this post.
3- What's a structure in this case?
4- Are there reasons in the context you're working in to consider insertions/deletions? Where do your sequences come from?

ADD REPLYlink modified 15 months ago • written 15 months ago by Jean-Karim Heriche23k

Jean-Karim Heriche started you in the right direction, but you're asking a lot in a single question. Answers for all of your points could be questions in their own right. You may want to dial-in which one(s) are giving you the most issues so you'll get more attention from high-throughput sequencing folks, people with experience in alignment, and structural biologists when applicable.

ADD REPLYlink written 15 months ago by Brice Sarver3.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1265 users visited in the last hour