RNASeq read coverage in protein space of MSA?
3.8 years ago
CephBirk ▴ 20

I used samtools depth to find the per base-pair read coverage over a number of isoform contigs from my Trinity assembly. I have also conducted a multiple sequence alignment of those isoforms using Clustal Omega. Now, in my final step, I want to assess the per-residue read coverage of the consensus amino acid sequence to demonstrate an idea of confidence in that sequence.

Does anyone have any tips on how to go about doing this and whether there are existing tools for such things? I'd much prefer not to reinvent the wheel if I don't have to...

few questions:

• is it correct you switch from nucleotides to amino-acids? since the AA are directly linked to the cDNA (codons), the per-3 read coverage of your isoform will be the same as for the amino-acid.

• what do you consider the consensus amino acid residue? majority rule?

I know I could just stick with the cDNA but the rest of the paper refers to various amino acid residues, so I thought it would be nice to show for example, that amino acid residue X has 400 read depth coverage supporting it. And for your second question, yes, I'm considering the consensus the majority rule. Fortunately, in this case, there aren't any hard calls. Just a few cases where 5 of the 6 contigs match and one has an extra bp or something like that.