Question: getting fasta sequences(proteome) from a file referencing another fasta file (tf)of the same organism
gravatar for kws15
15 months ago by
kws1540 wrote:

Hi everyone,

basically I have 2 large fasta sequences file, the first one is the proteome fasta sequences ( all the protein sequences), the second one is the transcription factor sequences fasta file of the same organism, i am just wondering if there is any way that I can extract the non transcriptional sequences as a fasta file using these two files?? many thanks

fasta file • 615 views
ADD COMMENTlink modified 15 months ago by natasha.sernova2.5k • written 15 months ago by kws1540
gravatar for natasha.sernova
15 months ago by
natasha.sernova2.5k wrote:

As far as I have understood, your task is the following:

you have two fasta-files. One of them contains all the proteins of your favorite organism,

the second file contains only transcription factors from the same organism.

You need to select proteins from the whole proteome-file

that are not in your second fasta-file with transcription factors, is it correct?

Look at the following post

A: Print Different Id From Sequence Comparison Of Two Fasta Files

There are different scripts on different languages, you will definitely find something suitable to you.

For example, bash comand-line function diff, it's OK for your problem in my opinion.

Or perl solution with a hash of "unseen" proteins.

Chapter 5.11 in the Cookbook. "Finding Common or Different Keys in Two Hashes".

So you can use whatever you prefer.

ADD COMMENTlink modified 15 months ago • written 15 months ago by natasha.sernova2.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1171 users visited in the last hour