Question: Find consensus sequence of several DNA sequences
0
gravatar for Bella_p
13 months ago by
Bella_p50
Bella_p50 wrote:

Hi!

I have a list of around 200 different DNA sequences, each ~150 bp long, and I'd like to find a consensus sequence for all of them. I'm sure there is probably a function that does that which I'm not familiar with. Does anyone know which package/function to use to do that? I prefer in python, but R is also OK....

Thanks!

ADD COMMENTlink modified 13 months ago by st.ph.n2.4k • written 13 months ago by Bella_p50

Hwave you tried a multiple sequence alignment?

Any of these tools should provide you with a consensus sequenc:

https://www.ebi.ac.uk/Tools/msa/

ADD REPLYlink written 13 months ago by YaGalbi1.4k
0
gravatar for st.ph.n
13 months ago by
st.ph.n2.4k
Philadelphia, PA
st.ph.n2.4k wrote:

You can use Biopython to create a consensus sequence.

#!/usr/bin/env python

import sys
from Bio import AlignIO
from Bio.Align import AlignInfo

alignment = AlignIO.read(sys.argv[1], 'fasta')
summary_align = AlignInfo.SummaryInfo(alignment)
summary_align.dumb_consensus(float(sys.argv[2]))

Save as consensus.py, run as python consensus.py input.fasta x, where x is the percentage of sequences to call a position in the consensus sequence; i.e. python consensus.py input.fasta 0.5 would mean that a residue or nucleotide would have to be represented in 50% of the sequences to call that position.

ADD COMMENTlink written 13 months ago by st.ph.n2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 879 users visited in the last hour