Is there any way to build a consensus sequence other than using multiple sequence alignment programs?
If written some consensus modules (for PacBio correction) that generate consensi from mappings of sequences to a single representative. If this is something you are looking for and if you are could share some more information, about what you want to achieve, I might be able to provide you with an apt script
Thanks. I've PCR amplicons sequenced with miseq and grouped based on unique molecular identifier (UMI).
Yes, I want to generate a single representative to correct the sequencing errors. I tried MSA tools such as muscle, clustlo to get the consensus but it doesn't work sometimes and I've case which looks like below.
Ex. have a look at the position 2 and 8 ( I'm not sure how to deal with the situation like position 2 to get the consensus without non-nt character and for position 8 I would prefer to exclude the seq 4 since majority of the seqs in the group don't have 'G' at the 8th position)
seq1 ATGCTCG - TCGTTTCGGT
seq2 AGACTCG - TCGTTTCGGT
seq3 ATCCTCG - TCGTTTCGGT
cons A+GCTCG - TCGTTTCGGT
Let me know what you think or if you need more details. It would be great if you can share the script.
Ah, sorry, I somehow missed your post. Okay, unfortunately none of my scripts will work out of the box for your data. However, I think I could adjust one of them. But before we go there, let me put another idea out there. Have you thought about simply assembling your fragments, one assembly per group? This should produce a consensus as well.
If I understand your setup correctly, all fragments of one UMI-group are roughly of the same length (unless partial) and are full length sequencing products of the same target region?
If assembly does not work, we can think about the consensus script. In your scenario it would randomly chose one base at a 50/50 SNP location and would remove the minority gap in your scenario. It can also handles reverse complement reads (which MSA programs usually cannot), although, it would require reads to be at least 50bp long.
G (in 50% of runs)
Right now, the script expects PacBio reads and parses header information etc. To make it work for you I do need a bit more information. Ideally and if in any way possible, a small sample data set - one group of fragments for which you want a single representative consensus.
Let me know what you think
No worries, thanks for your kind reply. I'm not sure if the assembly would work. Yes, one UMI-group is roughly of the same length and its the same target region. The fragments are 400-500 bp long, consensus script sounds like a good solution if its adding the random base (however it would be better to have a base that don't give stop codon) and removing minority gap. I'm now taking some help from the local bioinformatician here, will see how it goes. I will come back to you with more information and a small dataset in case if I don't find the solution. Appreciate your help. Thanks!
Sure, good luck. And it be interested to know, if you come up with a good solution.
Is your data 16S amplicon? This paper should provide some insights.
Thanks, I know this paper. Its b cell receptor sequencing data
Is it barcoded individuals (or cell culture, or whatever), or is it a pool? Did you normalize before sequencing?
Its barcoded cDNA molecules, normalized before sequencing.
Do you expect variability per barcoded sample? Can't you filter by coverage?