Question: Building consensus without MSA
0
gravatar for GP
4.7 years ago by
GP10
Sweden
GP10 wrote:

Hi All,

Is there any way to build a consensus sequence other than using multiple sequence alignment programs?

Thanks!

sequence • 1.3k views
ADD COMMENTlink modified 4.7 years ago by h.mon29k • written 4.7 years ago by GP10

If written some consensus modules (for PacBio correction) that generate consensi from mappings of sequences to a single representative. If this is something you are looking for and if you are could share some more information, about what you want to achieve, I might be able to provide you with an apt script

ADD REPLYlink written 4.7 years ago by thackl2.8k

Thanks. I've PCR amplicons sequenced with miseq and grouped based on unique molecular identifier (UMI).

Yes, I want to generate a single representative to correct the sequencing errors. I tried MSA tools such as muscle, clustlo to get the consensus but it doesn't work sometimes and I've case which looks like below.

Ex. have a look at the position 2  and 8 ( I'm not sure how to deal with the situation like position 2 to get the consensus without non-nt character and for position 8 I would prefer to exclude the seq 4 since majority of the seqs in the group don't have 'G' at the 8th position)

seq1 ATGCTCG - TCGTTTCGGT

seq2 AGACTCG - TCGTTTCGGT

seq3 ATCCTCG - TCGTTTCGGT

seq4 AGGCTCGGTCGTTTCGGT


cons A+GCTCG - TCGTTTCGGT

Let me know what you think or if you need more details. It would be great if you can share the script.

Thanks

 

 

 

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by GP10

Ah, sorry, I somehow missed your post. Okay, unfortunately none of my scripts will work out of the box for your data. However, I think I could adjust one of them. But before we go there, let me put another idea out there. Have you thought about simply assembling your fragments, one assembly per group? This should produce a consensus as well.

If I understand your setup correctly, all fragments of one UMI-group are roughly of the same length (unless partial) and are full length sequencing products of the same target region?

If assembly does not work, we can think about the consensus script. In your scenario it would randomly chose one base at a 50/50 SNP location and would remove the minority gap in your scenario. It can also handles reverse complement reads (which MSA programs usually cannot), although, it would require reads to be at least 50bp long.

ATGCTCGTCGTTTCGGT
 G (in 50% of runs)

Right now, the script expects PacBio reads and parses header information etc. To make it work for you I do need a bit more information. Ideally and if in any way possible, a small sample data set - one group of fragments for which you want a single representative consensus.
Let me know what you think

 

 

ADD REPLYlink written 4.7 years ago by thackl2.8k

Hi,

No worries, thanks for your kind reply. I'm not sure if the assembly would work. Yes, one UMI-group is roughly of the same length and its the same target region. The fragments are 400-500 bp long, consensus script sounds like a good solution if its adding the random base (however it would be better to have a base that don't give stop codon) and removing minority gap. I'm now taking some help from the local bioinformatician here, will see how it goes. I will come back to you with more information and a small dataset in case if I don't find the solution. Appreciate your help. Thanks!

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by GP10

Sure, good luck. And it be interested to know, if you come up with a good solution.

ADD REPLYlink written 4.7 years ago by thackl2.8k
0
gravatar for h.mon
4.7 years ago by
h.mon29k
Brazil
h.mon29k wrote:

Is your data 16S amplicon? This paper should provide some insights.

ADD COMMENTlink written 4.7 years ago by h.mon29k

Thanks, I know this paper. Its b cell receptor sequencing data 

ADD REPLYlink written 4.7 years ago by GP10

Is it barcoded individuals (or cell culture, or whatever), or is it a pool? Did you normalize before sequencing?

ADD REPLYlink written 4.7 years ago by h.mon29k

Its barcoded cDNA molecules, normalized before sequencing.

ADD REPLYlink written 4.7 years ago by GP10

Do you expect variability per barcoded sample? Can't you filter by coverage?

ADD REPLYlink written 4.7 years ago by h.mon29k

Yes. No.

ADD REPLYlink modified 3 months ago by RamRS25k • written 4.7 years ago by GP10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1374 users visited in the last hour