Question: Best practices for analysing protein (amino acid) sequences?
0
gravatar for vmax
11 months ago by
vmax0
Europe
vmax0 wrote:

My background is in RNA-seq, but I’m now starting to work with protein (amino acid) sequences. There is a lot of literature for best practices and quality checks throughout RNA-seq analysis (from raw reads to say differential expression analysis or variant calling).

I’m having trouble finding similar literature for working with protein sequences. Things I’m thinking about are 1) If I need to filter redundant sequences, how do different thresholds for sequence similarity affect my results? 2) How do I check the accuracy of a multiple sequence alignment? 3) What about accuracy when I align to a structure? 4) Do I give any special considerations towards gaps (insertions/deletions)? 5) Other things I’m not aware of?

Does anyone have resources that would help answer these questions?

If necessary to know, I’m aligning multiple sequences to a structure, clustering and evaluating point mutations.

sequence alignment • 230 views
ADD COMMENTlink written 11 months ago by vmax0

Can you clarify what you want to achieve? What do you call a structure? What do you mean by evaluating point mutations?
1- Filtering out proteins with redundant sequences depends on what you want to do. For evaluating amino-acid variability, you may want to keep all sequences but it really depends on what your data set is and what the goal is.
2- For accuracy of MSAs, check papers that compare MSA methods/tools, like those mentioned in this post.
3- What's a structure in this case?
4- Are there reasons in the context you're working in to consider insertions/deletions? Where do your sequences come from?

ADD REPLYlink modified 11 months ago • written 11 months ago by Jean-Karim Heriche23k

Jean-Karim Heriche started you in the right direction, but you're asking a lot in a single question. Answers for all of your points could be questions in their own right. You may want to dial-in which one(s) are giving you the most issues so you'll get more attention from high-throughput sequencing folks, people with experience in alignment, and structural biologists when applicable.

ADD REPLYlink written 11 months ago by Brice Sarver3.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1036 users visited in the last hour