Question: Tool/Script To Get Consensus Sequences
0
gravatar for upendrakumar.devisetty
6.7 years ago by
United States
upendrakumar.devisetty350 wrote:

I have three fasta files and basically all i want is to get is the consensus or union of three fasta files. I mean, i want to pull out all the sequences that are common among three files and put them in a separate fasta file. The three fasta files were generated differently and have different headers but the sequences are identical among three files. Is there a tool or script that does this?

msa • 3.6k views
ADD COMMENTlink modified 6.7 years ago by Damian Kao15k • written 6.7 years ago by upendrakumar.devisetty350
1

If there is no existing tool/script, this is a fairly basic bioinformatics programming task which you should learn how to do.

ADD REPLYlink written 6.7 years ago by Neilfws48k

I do understand it is basic bioinformatics programming but i thought if there is something already written then i can just use it rather than trying to write by myself

ADD REPLYlink written 6.7 years ago by upendrakumar.devisetty350

How large are the fasta files?

ADD REPLYlink written 6.7 years ago by Damian Kao15k

The files are not more than 100MB

ADD REPLYlink written 6.7 years ago by upendrakumar.devisetty350

search for "multiple sequence alignment" in a search engine of your choice or encyclopedia to get a starting point.

ADD REPLYlink written 6.7 years ago by Michael Dondrup46k

Your question title and use of the term "consensus sequence" is misleading, it is reserved for alignments; it seems that you are looking to find exactly identical sequences instead. Please take at least one quarter of the time it will take us to answer your question, to put a proper example, and to explain what you already have tried.

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Michael Dondrup46k
0
gravatar for upendrakumar.devisetty
6.7 years ago by
United States
upendrakumar.devisetty350 wrote:

Sorry if i didn't explained my question well. Here is what i wanted...

I have three fasta files

fasta1

seq1 ATGATG seq2 GATAGATA seq3 TGGTGG

fasta2

m1 GATAGATA m2 TGGTGG m3 AGGAGG

fasta3

seq1 gene1 ATGATG seq3 gene3 TGGTGG seq4 gene4 AGTGTG

And what i wanted is basically a final fasta file containing the below

final_fasta

seq3 TGGTGG

As you can see this sequence is represented in all fasta files but the header i got is from fasa1

I am currently using a lengthy process to achieve this.

First blast fasta1 to fasta2 and take all the hits and use those hits to blasta fasta3.

Thought it should work it is lengthy and i though if somebody has done something like this before i can use their script/tools

ADD COMMENTlink written 6.7 years ago by upendrakumar.devisetty350
0
gravatar for Damian Kao
6.7 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

use CDHIT: http://weizhong-lab.ucsd.edu/cd-hit/

ADD COMMENTlink written 6.7 years ago by Damian Kao15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 848 users visited in the last hour