Question

Building consensus without MSA

0

Entering edit mode

8.9 years ago

GP ▴ 10

Hi All,

Is there any way to build a consensus sequence other than using multiple sequence alignment programs?

Thanks!

sequence • 2.6k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.9 years ago by GP ▴ 10

0

Entering edit mode

If written some consensus modules (for PacBio correction) that generate consensi from mappings of sequences to a single representative. If this is something you are looking for and if you are could share some more information, about what you want to achieve, I might be able to provide you with an apt script

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by thackl ★ 3.0k

0

Entering edit mode

Thanks. I've PCR amplicons sequenced with miseq and grouped based on unique molecular identifier (UMI).

Yes, I want to generate a single representative to correct the sequencing errors. I tried MSA tools such as muscle, clustlo to get the consensus but it doesn't work sometimes and I've case which looks like below.

Ex. have a look at the position 2 and 8 ( I'm not sure how to deal with the situation like position 2 to get the consensus without non-nt character and for position 8 I would prefer to exclude the seq 4 since majority of the seqs in the group don't have 'G' at the 8th position)

seq1 ATGCTCG-TCGTTTCGGT
seq2 AGACTCG-TCGTTTCGGT
seq3 ATCCTCG-TCGTTTCGGT
seq4 AGGCTCGGTCGTTTCGGT
cons A+GCTCG-TCGTTTCGGT
      ^     ^

Let me know what you think or if you need more details. It would be great if you can share the script.

Thanks

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by GP ▴ 10

0

Entering edit mode

Ah, sorry, I somehow missed your post. Okay, unfortunately none of my scripts will work out of the box for your data. However, I think I could adjust one of them. But before we go there, let me put another idea out there. Have you thought about simply assembling your fragments, one assembly per group? This should produce a consensus as well.

If I understand your setup correctly, all fragments of one UMI-group are roughly of the same length (unless partial) and are full length sequencing products of the same target region?

If assembly does not work, we can think about the consensus script. In your scenario it would randomly chose one base at a 50/50 SNP location and would remove the minority gap in your scenario. It can also handles reverse complement reads (which MSA programs usually cannot), although, it would require reads to be at least 50bp long.

ATGCTCGTCGTTTCGGT
 G (in 50% of runs)

Right now, the script expects PacBio reads and parses header information etc. To make it work for you I do need a bit more information. Ideally and if in any way possible, a small sample data set - one group of fragments for which you want a single representative consensus.

Let me know what you think

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by thackl ★ 3.0k

0

Entering edit mode

Hi,

No worries, thanks for your kind reply. I'm not sure if the assembly would work. Yes, one UMI-group is roughly of the same length and its the same target region. The fragments are 400-500 bp long, consensus script sounds like a good solution if its adding the random base (however it would be better to have a base that don't give stop codon) and removing minority gap. I'm now taking some help from the local bioinformatician here, will see how it goes. I will come back to you with more information and a small dataset in case if I don't find the solution. Appreciate your help. Thanks!

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by GP ▴ 10

0

Entering edit mode

Sure, good luck. And it be interested to know, if you come up with a good solution.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by thackl ★ 3.0k

Ram · Answer 1 · 2015-06-12

0

Entering edit mode

8.9 years ago

h.mon 35k

Is your data 16S amplicon? This paper should provide some insights.

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.9 years ago by h.mon 35k

0

Entering edit mode

Thanks, I know this paper. Its b-cell receptor sequencing data

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by GP ▴ 10

0

Entering edit mode

Is it barcoded individuals (or cell culture, or whatever), or is it a pool? Did you normalize before sequencing?

ADD REPLY • link 8.9 years ago by h.mon 35k

0

Entering edit mode

Its barcoded cDNA molecules, normalized before sequencing.

ADD REPLY • link 8.9 years ago by GP ▴ 10

0

Entering edit mode

Do you expect variability per barcoded sample? Can't you filter by coverage?

ADD REPLY • link 8.9 years ago by h.mon 35k

0

Entering edit mode

Yes. No.

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.9 years ago by GP ▴ 10