Question: getting fasta sequences(proteome) from a file referencing another fasta file (tf)of the same organism
gravatar for kws15
4.2 years ago by
kws1540 wrote:

Hi everyone,

basically I have 2 large fasta sequences file, the first one is the proteome fasta sequences ( all the protein sequences), the second one is the transcription factor sequences fasta file of the same organism, i am just wondering if there is any way that I can extract the non transcriptional sequences as a fasta file using these two files?? many thanks

fasta file • 1.7k views
ADD COMMENTlink modified 4.2 years ago by natasha.sernova3.7k • written 4.2 years ago by kws1540
gravatar for natasha.sernova
4.2 years ago by
natasha.sernova3.7k wrote:

As far as I have understood, your task is the following:

you have two fasta-files. One of them contains all the proteins of your favorite organism,

the second file contains only transcription factors from the same organism.

You need to select proteins from the whole proteome-file

that are not in your second fasta-file with transcription factors, is it correct?

Look at the following post

A: Print Different Id From Sequence Comparison Of Two Fasta Files

There are different scripts on different languages, you will definitely find something suitable to you.

For example, bash comand-line function diff, it's OK for your problem in my opinion.

Or perl solution with a hash of "unseen" proteins.

Chapter 5.11 in the Cookbook. "Finding Common or Different Keys in Two Hashes".

So you can use whatever you prefer.

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by natasha.sernova3.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1007 users visited in the last hour