Hi, I have a fasta file and I want to only extract the sequences and the part of the header before and after the '|' (eg P00533 EGFR_HUMAN Epidermal growth factor receptor) if OS=Homo Sapiens. Is this possible quth a quick awk one liner??
>sp|P00533|EGFR_HUMAN Epidermal growth factor receptor OS=Homo sapiens GN=EGFR PE=1 SV=2
MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEV
VLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALA
VLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDF
QNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGC
TGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYV
VTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFK
NCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAF
>sp|P31749|AKT1_HUMAN RAC-alpha serine/threonine-protein kinase OS=Homo sapiens GN=AKT1 PE=1
MSDVAIVKEGWLHKRGEYIKTWRPRYFLLKNDGTFIGYKERPQDVDQREAPLNNFSVAQC
QLMKTERPRPNTFIIRCLQWTTVIERTFHVETPEEREEWTTAIQTVADGLKKQEEEEMDF
RSGSPSDNSGAEEMEVSLAKPKHRVTMNEFEYLKLLGKGTFGKVILVKEKATGRYYAMKI
>tr|P91634|P91634_DROME PI-3 kinase OS=Drosophila melanogaster GN=Pi3K92E PE=1 SV=1
MNMMDNRALAYVAHQPKYETPPEEAEPPCMRFSVNLWKNEMLNWVDLICLLPNGFLLELR
VNPANTIQVIKVEMVNQAKQMPLGYVIKEACEYQVYGISTFNIEPYTDETKRLSEVQPYF
GILSLGERTDTTSFSSDYELTKMVNGMIGTTFDHNRTHGSPEIDDFRLYMTQTCDNIELE
Thank you very much Kevin. I'm new to bash, I understand from your code how you separate the line and only print either side of the '|' with the 'split(a[1], b, "|"); print ">"b[2]" "b[3]}'. Can you explain what the bprint is doing??
Hey Zoe, could you repost this as a comment to my answer? Just to maintain fluidity of the thread. You can delete this and then re-post above. I have answered your comment already there!
Awesome, thank you Kevin!
Please use
ADD COMMENTorADD REPLYto answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.