Question: getting fasta sequences(proteome) from a file referencing another fasta file (tf)of the same organism
0
gravatar for kws15
19 months ago by
kws1540
kws1540 wrote:

Hi everyone,

basically I have 2 large fasta sequences file, the first one is the proteome fasta sequences ( all the protein sequences), the second one is the transcription factor sequences fasta file of the same organism, i am just wondering if there is any way that I can extract the non transcriptional sequences as a fasta file using these two files?? many thanks

fasta file • 761 views
ADD COMMENTlink modified 19 months ago by natasha.sernova2.6k • written 19 months ago by kws1540
1
gravatar for natasha.sernova
19 months ago by
natasha.sernova2.6k
natasha.sernova2.6k wrote:

As far as I have understood, your task is the following:

you have two fasta-files. One of them contains all the proteins of your favorite organism,

the second file contains only transcription factors from the same organism.

You need to select proteins from the whole proteome-file

that are not in your second fasta-file with transcription factors, is it correct?

Look at the following post

A: Print Different Id From Sequence Comparison Of Two Fasta Files

There are different scripts on different languages, you will definitely find something suitable to you.

For example, bash comand-line function diff, it's OK for your problem in my opinion.

Or perl solution with a hash of "unseen" proteins.

http://www.geos.ed.ac.uk/~bmg/software/Perl%20Books/OReilly.Perl.Cookbook.pdf

Chapter 5.11 in the Cookbook. "Finding Common or Different Keys in Two Hashes".

So you can use whatever you prefer.

ADD COMMENTlink modified 19 months ago • written 19 months ago by natasha.sernova2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 584 users visited in the last hour