Question: merging a number of overlapping sanger sequences
gravatar for thomas.welch
3.3 years ago by
thomas.welch40 wrote:

Hi there,

I have 80 DNA samples in which we have sequences three overlapping sections of a large gene using the sanger method. I am now looking for a way to merge these sequenced segments of the gene into a single sequence for each sample so that they can be aligned for analysis.

I have come across a couple of tools for doing this with just two overlapping sequences (such as emboss), and i've seen that this can be done with bioedit for one sample at a time, but is there a tool that can allow me to do this in bulk. or will i have to align and assemble them as i would with ngs data of a genome?

Thankful for any answers.

Kind Regards, Tom

alignment merge sequence • 3.1k views
ADD COMMENTlink modified 2.3 years ago by ferroao20 • written 3.3 years ago by thomas.welch40

You should give from BBMap a try. It should work with fasta formatted sequences.

ADD REPLYlink written 3.3 years ago by GenoMax94k

This would be trivially simple if you had access to Sequencher, DNASTAR, ContigExpress from Vector NTI among others (Note: these are all commercial software packages and are not free). Consed suite will work as well but it will require signing an academic agreement and some effort on your part to install everything.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by GenoMax94k
gravatar for Fabio Marroni
3.3 years ago by
Fabio Marroni2.6k
Fabio Marroni2.6k wrote:

I would suggest the suite phred/phrap/consed. It was widely used in the "old" Sanger days, and after all it was working pretty well. Consed is a "finishing" tool, which is nevertheless pretty useful to visualize assemblies and correct errors. The major drawback is that you might have to invest some time to learn how to use them.

ADD COMMENTlink written 3.3 years ago by Fabio Marroni2.6k
gravatar for ferroao
2.3 years ago by
ferroao20 wrote:

If you have a fasta with all sequences, you can use this R script

# install libraries and dependencies
# necessary for sangeranalyseR 
# for ex.
# BiocInstaller::biocLite("DECIPHER")

# install sangeranalyseR package
setwd("~/your folder")

# read fasta file with several sequences 
fastas<-seqinr::read.fasta("myFastas.fas", as.string=T)
# make DNAstring objects 
reads = DNAStringSet(as.character(fastas) ) 
names(reads) = names(fastas)
# merge sequences 
merged.reads = merge.reads(reads)
# consensus 

# write to file 
seqinr::write.fasta(as.character(merged.reads$consensus), "consensus", file.out="cons.fas", nbchar=100000, as.string=T)

or this python script

python3.4 -f myfastas.fas -r myout.fas

Script in: Copied from Rosa Tung

ADD COMMENTlink modified 21 months ago • written 2.3 years ago by ferroao20

Hi ferroao

You claim you've "forked" your code from, but you've actually copied over their code to a different git site (github vs gitlab). Also, all you've done is made the input and output files command line arguments (and added an unnecessary step to strip empty lines). You have not changed any of the underlying algorithm. Have you at least addressed the 50-sequence, 1000-length, ATCG-only limitations?

I'd like to understand why you're spamming old threads with a script you did not author when the script is 2 years old, has so many limitations and was written as part of what looks like a classroom what is definitely a rosalind challenge?

If you sincerely think the script is performant, please create a Tool type post for it.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by _r_am32k

I think my answers is appropriate to this question. You can use the moderate option if you want so. Most limitations you talked about are just about the example.txt not the script. Best,

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by ferroao20

No, but I do not appreciate code adapted from repositories without due attribution, especially when the contribution post adaptation is negligible. The code is from a rosalind challenge by an amateur coder, so I am pretty sure it is not as good as established, tested tools. In addition to this, going back to year-old posts to add an answer advertising a poor solution that is ill-adapted on top is not recommended. I will not use the moderate option as what you're doing is not inappropriate, just a little ill-advised.

ADD REPLYlink written 2.3 years ago by _r_am32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1065 users visited in the last hour