Question

how to append numbers to a column using awk and R?

0

Entering edit mode

2.1 years ago

mthm ▴ 50

I have a bed file :

xyz.00001 650     730     Target "Motif:DUF" 603 686
xyz.00001 12218   12323   Target "Motif:ERV" 906 995
xyz.00001 14034   14196   Target "Motif:R1-I" 95 257
xyz.00001 15336   15393   Target "Motif:R1-I2" 5843 5900
xyz.00001 17117   17166   Target "Motif:Zin" 161 211
.
.
.

I need to append numbers to the last column, preferably to the end of "Target", so Target1, Target2, Target3,...

awk '{print "Target"NR  $s}' test2

the result is

Target1xyz.00001 650     730     Target "Motif:DUF" 603 686
Target2xyz.00001 12218   12323   Target "Motif:ERV" 906 995
Target3xyz.00001 14034   14196   Target "Motif:R1-I" 95 257
Target4xyz.00001 15336   15393   Target "Motif:R1-I2" 5843 5900
Target5xyz.00001 17117   17166   Target "Motif:Zin" 161 211

how to fix this?

Also, since I am more primitive in R, how would you do that in R?

append awk R • 1.5k views

ADD COMMENT • link updated 2.1 years ago by cpad0112 21k • written 2.1 years ago by mthm ▴ 50

0

Entering edit mode

$ awk -F "\t" -v OFS="\t" '{sub("$","_"NR,$4)}1' test.txt

xyz.00001   650 730 Target_1    "Motif:DUF" 603 686
xyz.00001   12218   12323   Target_2    "Motif:ERV" 906 995
xyz.00001   14034   14196   Target_3    "Motif:R1-I"    95  257
xyz.00001   15336   15393   Target_4    "Motif:R1-I2"   5843    5900
xyz.00001   17117   17166   Target_5    "Motif:Zin" 161 211


$ awk -F "\t" -v OFS="\t" '{$4=sprintf("%s",$4"_"NR)}1' test.txt

xyz.00001   650 730 Target_1    "Motif:DUF" 603 686
xyz.00001   12218   12323   Target_2    "Motif:ERV" 906 995
xyz.00001   14034   14196   Target_3    "Motif:R1-I"    95  257
xyz.00001   15336   15393   Target_4    "Motif:R1-I2"   5843    5900
xyz.00001   17117   17166   Target_5    "Motif:Zin" 161 211

ADD REPLY • link 2.1 years ago by cpad0112 21k

score 2 · Answer 1 · 2022-04-12

2

Entering edit mode

2.1 years ago

Hamid Ghaedi 3.2k

Kevin Blighe already has answered this with sed and awk, in R you may try this:

# define a vector of sequential numbers to add to the 4th column of df
seqNum = seq(1:length(df[,4]))
# Update the 4th column with appended numbers
df[,4] = paste0(df[,4], seqNum)

Result:

1 xyz.00001   650   730 Target1   Motif:DUF  603  686
2 xyz.00001 12218 12323 Target2   Motif:ERV  906  995
3 xyz.00001 14034 14196 Target3  Motif:R1-I   95  257
4 xyz.00001 15336 15393 Target4 Motif:R1-I2 5843 5900
5 xyz.00001 17117 17166 Target5   Motif:Zin  161  211

ADD COMMENT • link 2.1 years ago by Hamid Ghaedi 3.2k

0

Entering edit mode

to be simple, with base R

 df[,4]=paste(df[,4],seq_along(df[,4]), sep="_")

ADD REPLY • link 2.1 years ago by cpad0112 21k

score 0 · Answer 2 · 2022-04-12

0

Entering edit mode

2.1 years ago

Kevin Blighe 87k

Can do it with awk:

.

display file contents

cat test.txt 
xyz.00001 650     730     Target "Motif:DUF" 603 686
xyz.00001 12218   12323   Target "Motif:ERV" 906 995
xyz.00001 14034   14196   Target "Motif:R1-I" 95 257
xyz.00001 15336   15393   Target "Motif:R1-I2" 5843 5900
xyz.00001 17117   17166   Target "Motif:Zin" 161 211

.

Ensure that repeat whitespace is converted to a single tab

sed -i 's/ \+/\t/g' test.txt
cat test.txt 
xyz.00001   650 730 Target  "Motif:DUF" 603 686
xyz.00001   12218   12323   Target  "Motif:ERV" 906 995
xyz.00001   14034   14196   Target  "Motif:R1-I"    95  257
xyz.00001   15336   15393   Target  "Motif:R1-I2"   5843    5900
xyz.00001   17117   17166   Target  "Motif:Zin" 161 211

.

create desired output

awk -F "\t" '{print $4NR$0}' test.txt 
Target1xyz.00001    650 730 Target  "Motif:DUF" 603 686
Target2xyz.00001    12218   12323   Target  "Motif:ERV" 906 995
Target3xyz.00001    14034   14196   Target  "Motif:R1-I"    95  257
Target4xyz.00001    15336   15393   Target  "Motif:R1-I2"   5843    5900
Target5xyz.00001    17117   17166   Target  "Motif:Zin" 161 211

.

Edit: NR is a special variable in awk, relating to the Number of Records processed

ADD COMMENT • link 2.1 years ago by Kevin Blighe 87k

0

Entering edit mode

sorry Kevin Blighe if I didn't explain clearly, I don't want to add a new "TargetX" in the beginning of the line, I want to add the number to the column 4 where Target already is. so:

xyz.00001   650   730 Target1 Motif:DUF  603  686
xyz.00001 12218 12323 Target2 Motif:ERV  906  995

ADD REPLY • link 2.1 years ago by mthm ▴ 50

2

Entering edit mode

Then try:

awk -F " " '$4=$4NR{print $0}' test2

or

awk -F "\t" '$4=$4NR{print $0}' test.txt

on the modified file. The -F parameter just changes which column separator you are using in the file. The actual change is happening before the print statement: The forth column $4 is redefined as all contents of the 4th column appended with the NR variable. If you need an underscore or such, then use $4=$4"_"NR.

ADD REPLY • link 2.1 years ago by Matthias Zepper 4.6k

1

Entering edit mode

awk -F "\t" '$4=$4NR' test.txt

should be much simpler.

ADD REPLY • link 2.1 years ago by cpad0112 21k

0

Entering edit mode

Indeed, very nifty! I had already forgotten that print $0 is the default action and can be omitted. Your proposal is now code golf worthy...

ADD REPLY • link 2.1 years ago by Matthias Zepper 4.6k

0

Entering edit mode

In a moment of haste, I had interpreted your pasted output as the desired output. Irrespective, by examining my code you should be easily able to adapt / edit it to in order to achieve what you truly want.

ADD REPLY • link 2.1 years ago by Kevin Blighe 87k

1

Entering edit mode

awk -F "\t" '{print $1"\t"$2"\t"$3"\t"$4NR"\t"$5"\t"$6"\t"$7}' test.txt
xyz.00001   650 730 Target1 "Motif:DUF" 603 686
xyz.00001   12218   12323   Target2 "Motif:ERV" 906 995
xyz.00001   14034   14196   Target3 "Motif:R1-I"    95  257
xyz.00001   15336   15393   Target4 "Motif:R1-I2"   5843    5900
xyz.00001   17117   17166   Target5 "Motif:Zin" 161 211

ADD REPLY • link 2.1 years ago by Kevin Blighe 87k