Question

Extract text between delimiters and use it to rename the same file

0

Entering edit mode

4.4 years ago

akansha.gitanjali ▴ 30

I have file that looks like

>sp|Q5T4S7|UBR4_HUMAN E3 ubiquitin-protein ligase UBR4 OS=Homo sapiens OX=9606 GN=UBR4 PE=1 SV=1
362 S AGC AQQVRTGSTSSKEDD 2.108 0.386

364 S AGC QVRTGSTSSKEDDYE 0.556 0.386

555 S AGC LQRQRKGSMSSDASA 4.466 0.386

625 S AGC ESSPRVKSPSKQAPG 2.518 0.386

904 T AGC DSNSRRATTPLYHGF 3.45 0.386

1049 S AGC SSRLRISSYVNWIKD 0.972 0.386

1473 T AGC AWLTRMTTSPPKDSD 0.463 0.386

1504 S AGC TYIVRENSQVGEGVC 1.114 0.386

1787 S AGC EEKPKKSSLCRTVEG 1.593 0.386

1941 T AGC DSSKRKLTLTRLASA 1.859 0.386

I used cut -d'|' -f2 output2.txt | head -1 to output Q5T4S7. Now I want to use this text to rename the same file. How do I do that?

linux shell bash • 751 views

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 4.4 years ago by akansha.gitanjali ▴ 30

0

Entering edit mode

assuming that only one header is present in a file and assuming that protein ID is always as in the example file, please try following bash script:

for i in *.fa; do echo cp $i $(awk -F "|" '/^>/ { print $2".fasta"}' $i);done

Script would dry-run the copy old file name to new file name. Please remove echo after new file and old file name validation. New files are named .fasta so that the code would not work on newly created fasta files.

ADD REPLY • link 4.4 years ago by cpad0112 21k

score 1 · Answer 1 · 2019-12-13

Please note that this is a linux/unix question and not necessarily bioinformatics. You should search StackOverflow for answers to questions like these.

Copying over my answer from the other location where you asked this question (which is something you should not do):

Here is a two step process that is easier to understand:

NEW_NAME=$(head -n 1 output.txt | cut -d'|' -f2)
mv output.txt $NEW_NAME

and here is a single line, that creates a sub-shell to handle creating the new name. I personally prefer this one as it is an extension of a more generic renaming operation (mv $filename ${filename/pattern/replacement})

mv output.txt $(head -n 1 output.txt | cut -d'|' -f2)

PS: I also optimized the cut | head to a better order of operations head | cut.