Question: Create list of sequences present in multiple FASTA files
0
gravatar for biostars
5.4 years ago by
biostars0
United Kingdom
biostars0 wrote:

Hi, I'm trying to make a list of amino acid sequences that are present in all of a selection of FASTA files I have. To make things confusing they all different feature IDs. Is there a script I can run that would be capable of doing this?

 

Thanks!

fasta • 1.8k views
ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by biostars0
1

Please clarify your specific problem or add additional details to highlight exactly what you need.

ADD REPLYlink written 5.4 years ago by Pierre Lindenbaum124k
1

Your comments are supposed to be pasted in these boxes based on the forum rules.

Yes. You can automate using sed.

Eg; sed 's/>/>file1_/g' file1.fasta >file1NamesChanged.fasta
ADD REPLYlink written 5.4 years ago by Prakki Rama2.3k
0
gravatar for Prakki Rama
5.4 years ago by
Prakki Rama2.3k
Singapore
Prakki Rama2.3k wrote:

One possibility can be

1) Change the headers in the each fasta file according to file name. 

 Suppose, if the sequence in file1.fasta is >protein1, you can change it to >file1_protein1

2) Then merge all the fasta files into one file. 

3) Run CD-HIT (with parameters like identity)

CD-HIT will then generate a list, which sequences are all similar and the representative sequence of the cluster. Because, you already have sequence header with file information in it, you will now easily know which proteins are present in multiple FASTA files.

~Prakki Rama.

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Prakki Rama2.3k
0
gravatar for biostars
5.4 years ago by
biostars0
United Kingdom
biostars0 wrote:

Thanks Prakki, is there a way to automate the renaming? There are quite a few sequences and it would take a long time doing it manually.

ADD COMMENTlink written 5.4 years ago by biostars0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1625 users visited in the last hour