Question: (Closed) compare IDs and extract corresponding similar sequences with IDs
0
gravatar for ruchi1st2002
3.9 years ago by
United States
ruchi1st20020 wrote:

Hi,

I am stuck in same kind of problem, as a newbie in bioinformatics, I am trying to extract the fasta sequences from a file_2 with similar ids from another list file_1

File_1.fasta

>comp148_c0_seq1

>comp169_c0_seq1        

>comp258_c0_seq1        

>comp285_c0_seq1        

>comp350_c0_seq1       

>comp424_c0_seq1       

>comp783_c0_seq1        

>comp1089_c0_seq1   

File_2.fasta

>comp6_c1_seq1  -1      22      237

MAILRFMDSWVVGVNVCGKRPRRFVDPINMIRETIIRVHVRPFGVWISIICLIISLTSQCWKEWRRLLIRRF

>comp11_c0_seq1 -2      35      358

MKEDDKVIEDDEKAEGSKGDIQKEEPGADDETEESNKLIGDNQGKDEANADEDDPQNEETIDKSEENKQREEQQQITLLHFIGRSFASLLKNLLKKTCPSAAEGNNYY

>comp42_c0_seq1 +3      114     305

MPSTLKFLLIYESHWYDKNSEKLVNEFLSLLAHCTQLRYMPILLEDYDLLKLIEEKNTRQFDKI

>comp43_c0_seq1 -2      38      298

MSSSNWAFYIGVSNGHVHDNLLVCKAPNCYCFPTRMDHCYIGGTQHHFEWPTDAISRPQWNGIGDTLGCGILLNPKNELAIFFTANG

>comp48_c0_seq1 +3      18      242

MLYKSLINSKSLRGKTPAEVVNMFANDGQRIFDAVTFAPLVLIGPLVLVGGLIYLLRVIGPVSLLAVSVFLIFDF

>comp53_c0_seq1 -1      55      312

MIGNRLRVKRDKVTLKMEISHCLHSQIIGRGGRNTQKIMRDTGCHIHFPDSNKCLTNPVTMPQQAKNDQVSISGCAKDVEKAREML

>comp56_c0_seq1 -2      110     379

MNNARLNAEINELHAAIHANVHYGRPFKPSHISMNKSQATDRSDNNVCGQLATIDNKNENDHDNDNDDNEANDETRERRRFTVADYMPGG

>comp56_c1_seq1 -1      52      408

MWIWSGPIFSTTILFGHISPFLKPGWRRRAYFCPNFPRRLRVYWATTALLSLLIITFLLGRHFFLGVPSQLTTSPEAIFPSTLGLDCGNVTLSFTAAANDEKVPSNQTLAADKVVASFA

>comp72_c0_seq1 +2      14      271

MYALTWNGLMELKLGADTCPDVNVNWEVFGERGLKSISLFAVADKVFIFSTPNELLVYDRGSATISQFPIASPPLKTLLAVSQSSQ

>comp74_c0_seq1 +3      96      305

MVIPFIFLKAQTIQLEAKDESNFCQNRVDLLAHEVEVRVRIGKRQKRALASSAGCCSCGRGPMGERGAPG

>comp79_c0_seq1 -1      31      213

MASAMFCCQVCMLLSSAYHIFGCHSPNRRKRWLRADLFGVSAGLIGLYLSGLYTSFYHFPV

I used awk:

diff <(cat file_1 | grep ">" | sort)  <(cat file_2 | grep ">" | sort)  | grep "^<" | awk -F\> '{print $2}'   

I m just getting ids only, I need sequences to corresponding ID also.

Can anyone suggest "awk" or perl code for the same? 

 

Thanks

RV

awk perl • 652 views
ADD COMMENTlink written 3.9 years ago by ruchi1st20020

Hello ruchi1st2002!

We believe that this post does not fit the main topic of this site.

Please search the forum, There are dozens of questions addressing every variant, and you have the simplest challenge of all.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 3.9 years ago by RamRS22k

Ok , no problem. I followed this post "find similar sequences between different sources" but i was unable to get answer using this post and few others, I added this question in other people's comment too, but if someone asks to put this as a unique question, That;s y I put this as a separate question.

Thanks

RV

ADD REPLYlink written 3.9 years ago by ruchi1st20020

The keywords you are looking for are "FASTA", "id", "extract". Use those and you will see at least 10 posts along the lines of your query.

ADD REPLYlink written 3.9 years ago by RamRS22k

Ok thanks! I will dig in deep.

RV

ADD REPLYlink written 3.9 years ago by ruchi1st20020

Ok , no problem. I followed this post "find similar sequences between different sources <https: www.biostars.org="" p="" 107403=""/>" but i was unable to get answer using this post and few others, I added this question in other people's comment too, but if someone asks to put this as a unique question, That;s y I put this as a separate question.

Thanks

RV

ADD REPLYlink written 3.9 years ago by ruchi1st20020
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1436 users visited in the last hour