Entering edit mode
8.7 years ago
ruchi1st2002
•
0
Hi,
I am stuck in same kind of problem, as a newbie in bioinformatics, I am trying to extract the fasta sequences from a file_2
with similar ids from another list file_1
File_1.fasta
>comp148_c0_seq1
>comp169_c0_seq1
>comp258_c0_seq1
>comp285_c0_seq1
>comp350_c0_seq1
>comp424_c0_seq1
>comp783_c0_seq1
>comp1089_c0_seq1
File_2.fasta
>comp6_c1_seq1 -1 22 237
MAILRFMDSWVVGVNVCGKRPRRFVDPINMIRETIIRVHVRPFGVWISIICLIISLTSQCWKEWRRLLIRRF
>comp11_c0_seq1 -2 35 358
MKEDDKVIEDDEKAEGSKGDIQKEEPGADDETEESNKLIGDNQGKDEANADEDDPQNEETIDKSEENKQREEQQQITLLHFIGRSFASLLKNLLKKTCPSAAEGNNYY
>comp42_c0_seq1 +3 114 305
MPSTLKFLLIYESHWYDKNSEKLVNEFLSLLAHCTQLRYMPILLEDYDLLKLIEEKNTRQFDKI
>comp43_c0_seq1 -2 38 298
MSSSNWAFYIGVSNGHVHDNLLVCKAPNCYCFPTRMDHCYIGGTQHHFEWPTDAISRPQWNGIGDTLGCGILLNPKNELAIFFTANG
>comp48_c0_seq1 +3 18 242
MLYKSLINSKSLRGKTPAEVVNMFANDGQRIFDAVTFAPLVLIGPLVLVGGLIYLLRVIGPVSLLAVSVFLIFDF
>comp53_c0_seq1 -1 55 312
MIGNRLRVKRDKVTLKMEISHCLHSQIIGRGGRNTQKIMRDTGCHIHFPDSNKCLTNPVTMPQQAKNDQVSISGCAKDVEKAREML
>comp56_c0_seq1 -2 110 379
MNNARLNAEINELHAAIHANVHYGRPFKPSHISMNKSQATDRSDNNVCGQLATIDNKNENDHDNDNDDNEANDETRERRRFTVADYMPGG
>comp56_c1_seq1 -1 52 408
MWIWSGPIFSTTILFGHISPFLKPGWRRRAYFCPNFPRRLRVYWATTALLSLLIITFLLGRHFFLGVPSQLTTSPEAIFPSTLGLDCGNVTLSFTAAANDEKVPSNQTLAADKVVASFA
>comp72_c0_seq1 +2 14 271
MYALTWNGLMELKLGADTCPDVNVNWEVFGERGLKSISLFAVADKVFIFSTPNELLVYDRGSATISQFPIASPPLKTLLAVSQSSQ
>comp74_c0_seq1 +3 96 305
MVIPFIFLKAQTIQLEAKDESNFCQNRVDLLAHEVEVRVRIGKRQKRALASSAGCCSCGRGPMGERGAPG
>comp79_c0_seq1 -1 31 213
MASAMFCCQVCMLLSSAYHIFGCHSPNRRKRWLRADLFGVSAGLIGLYLSGLYTSFYHFPV
I used awk:
diff <(cat file_1 | grep ">" | sort) <(cat file_2 | grep ">" | sort) | grep "^<" | awk -F\> '{print $2}'
I'm just getting ids only, I need sequences to corresponding ID also.
Can anyone suggest "awk" or perl code for the same?
Thanks
RV