how to append numbers to a column using awk and R?
2
0
Entering edit mode
2.1 years ago
mthm ▴ 50

I have a bed file :

xyz.00001 650     730     Target "Motif:DUF" 603 686
xyz.00001 12218   12323   Target "Motif:ERV" 906 995
xyz.00001 14034   14196   Target "Motif:R1-I" 95 257
xyz.00001 15336   15393   Target "Motif:R1-I2" 5843 5900
xyz.00001 17117   17166   Target "Motif:Zin" 161 211
.
.
.

I need to append numbers to the last column, preferably to the end of "Target", so Target1, Target2, Target3,...

awk '{print "Target"NR  $s}' test2

the result is

Target1xyz.00001 650     730     Target "Motif:DUF" 603 686
Target2xyz.00001 12218   12323   Target "Motif:ERV" 906 995
Target3xyz.00001 14034   14196   Target "Motif:R1-I" 95 257
Target4xyz.00001 15336   15393   Target "Motif:R1-I2" 5843 5900
Target5xyz.00001 17117   17166   Target "Motif:Zin" 161 211

how to fix this?

Also, since I am more primitive in R, how would you do that in R?

append awk R • 1.5k views
ADD COMMENT
0
Entering edit mode
$ awk -F "\t" -v OFS="\t" '{sub("$","_"NR,$4)}1' test.txt

xyz.00001   650 730 Target_1    "Motif:DUF" 603 686
xyz.00001   12218   12323   Target_2    "Motif:ERV" 906 995
xyz.00001   14034   14196   Target_3    "Motif:R1-I"    95  257
xyz.00001   15336   15393   Target_4    "Motif:R1-I2"   5843    5900
xyz.00001   17117   17166   Target_5    "Motif:Zin" 161 211


$ awk -F "\t" -v OFS="\t" '{$4=sprintf("%s",$4"_"NR)}1' test.txt

xyz.00001   650 730 Target_1    "Motif:DUF" 603 686
xyz.00001   12218   12323   Target_2    "Motif:ERV" 906 995
xyz.00001   14034   14196   Target_3    "Motif:R1-I"    95  257
xyz.00001   15336   15393   Target_4    "Motif:R1-I2"   5843    5900
xyz.00001   17117   17166   Target_5    "Motif:Zin" 161 211
ADD REPLY
2
Entering edit mode
2.1 years ago

Kevin Blighe already has answered this with sed and awk, in R you may try this:

# define a vector of sequential numbers to add to the 4th column of df
seqNum = seq(1:length(df[,4]))
# Update the 4th column with appended numbers
df[,4] = paste0(df[,4], seqNum)

Result:

1 xyz.00001   650   730 Target1   Motif:DUF  603  686
2 xyz.00001 12218 12323 Target2   Motif:ERV  906  995
3 xyz.00001 14034 14196 Target3  Motif:R1-I   95  257
4 xyz.00001 15336 15393 Target4 Motif:R1-I2 5843 5900
5 xyz.00001 17117 17166 Target5   Motif:Zin  161  211
ADD COMMENT
0
Entering edit mode

to be simple, with base R

 df[,4]=paste(df[,4],seq_along(df[,4]), sep="_")
ADD REPLY
0
Entering edit mode
2.1 years ago

Can do it with awk:

.

display file contents

cat test.txt 
xyz.00001 650     730     Target "Motif:DUF" 603 686
xyz.00001 12218   12323   Target "Motif:ERV" 906 995
xyz.00001 14034   14196   Target "Motif:R1-I" 95 257
xyz.00001 15336   15393   Target "Motif:R1-I2" 5843 5900
xyz.00001 17117   17166   Target "Motif:Zin" 161 211

.

Ensure that repeat whitespace is converted to a single tab

sed -i 's/ \+/\t/g' test.txt
cat test.txt 
xyz.00001   650 730 Target  "Motif:DUF" 603 686
xyz.00001   12218   12323   Target  "Motif:ERV" 906 995
xyz.00001   14034   14196   Target  "Motif:R1-I"    95  257
xyz.00001   15336   15393   Target  "Motif:R1-I2"   5843    5900
xyz.00001   17117   17166   Target  "Motif:Zin" 161 211

.

create desired output

awk -F "\t" '{print $4NR$0}' test.txt 
Target1xyz.00001    650 730 Target  "Motif:DUF" 603 686
Target2xyz.00001    12218   12323   Target  "Motif:ERV" 906 995
Target3xyz.00001    14034   14196   Target  "Motif:R1-I"    95  257
Target4xyz.00001    15336   15393   Target  "Motif:R1-I2"   5843    5900
Target5xyz.00001    17117   17166   Target  "Motif:Zin" 161 211


.

Edit: NR is a special variable in awk, relating to the Number of Records processed

ADD COMMENT
0
Entering edit mode

sorry Kevin Blighe if I didn't explain clearly, I don't want to add a new "TargetX" in the beginning of the line, I want to add the number to the column 4 where Target already is. so:

xyz.00001   650   730 Target1 Motif:DUF  603  686
xyz.00001 12218 12323 Target2 Motif:ERV  906  995
ADD REPLY
2
Entering edit mode

Then try:

awk -F " " '$4=$4NR{print $0}' test2

or

awk -F "\t" '$4=$4NR{print $0}' test.txt

on the modified file. The -F parameter just changes which column separator you are using in the file. The actual change is happening before the print statement: The forth column $4 is redefined as all contents of the 4th column appended with the NR variable. If you need an underscore or such, then use $4=$4"_"NR.

ADD REPLY
1
Entering edit mode
awk -F "\t" '$4=$4NR' test.txt

should be much simpler.

ADD REPLY
0
Entering edit mode

Indeed, very nifty! I had already forgotten that print $0 is the default action and can be omitted. Your proposal is now code golf worthy...

ADD REPLY
0
Entering edit mode

In a moment of haste, I had interpreted your pasted output as the desired output. Irrespective, by examining my code you should be easily able to adapt / edit it to in order to achieve what you truly want.

ADD REPLY
1
Entering edit mode
awk -F "\t" '{print $1"\t"$2"\t"$3"\t"$4NR"\t"$5"\t"$6"\t"$7}' test.txt
xyz.00001   650 730 Target1 "Motif:DUF" 603 686
xyz.00001   12218   12323   Target2 "Motif:ERV" 906 995
xyz.00001   14034   14196   Target3 "Motif:R1-I"    95  257
xyz.00001   15336   15393   Target4 "Motif:R1-I2"   5843    5900
xyz.00001   17117   17166   Target5 "Motif:Zin" 161 211
ADD REPLY

Login before adding your answer.

Traffic: 1598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6