Question

Renaming and editing a bed file

0

Entering edit mode

7.2 years ago

Debbie ▴ 10

have around 80 bed files with 1st 3 columns (example : X2_example.bed, where X2 is the gene name) and I want to add a 4th column with gene name and rename the file (example attached: X2_example_edited.bed, Y2_example_edited.bed and so on..), and then merge these files together to create 1 bed file.

I can add the 4th column with gene name and save the file with a different name with the code

sed 's/$/\tX2/' < X2_example.bed > X2_example_edited.bed

This is the generated bed file

chr17 42276210 42276219 X2

chr17 42297938 42297947 X2

chr17 42276210 42276219 X2

chr17 42297938 42297947 X2

But I have to do this separately for each bed file. This there a way I can extract the gene name from the name of the file (eg. X2 from X2_example.bed) and then add that to the 4th column of the bed file and save it as X2_example_edited.bed.

I can extract the gene name from the file name echo "X2_example.bed" | awk -F'[_.]' '{print $1}

However, as I have too many files I am looking for a way to generate a loop to automate this.

Also I need to merge all the generated bed file which I can do by

cat *_edited.bed >output.bed

However, I am having an error (see attached example: output.bed), the last line of 1st file and 1st line of next file are on same line.

chr3 18467066 18467075 Y2chr17 42276210 42276219 X2

I know this must be a very basic thing, but I am new to this analysis and have limited knowledge. Thanks in advance

ChIP-Seq Bed sed awk • 2.8k views

ADD COMMENT • link updated 7.2 years ago by Alex Reynolds 35k • written 7.2 years ago by Debbie ▴ 10

score 0 · Answer 1 · 2017-01-19

0

Entering edit mode

7.2 years ago

Alex Reynolds 35k

Here's a way to do this with BEDOPS bedops --everything and a for loop in bash:

Note that the blank space in the sed command before ${title} is a literal tab — press Control-V and then the Tab character:

#!/bin/bash
for file in `find . -name "*.bed" -maxdepth 1`
do
   title=`basename "${file}" | awk -F'[_.]' '{print $1}'`
   sed 's/$/'"   ${title}"'/' ${file} > ${file}.edited.bed
done
bedops --everything *.edited.bed > union.bed

Here's another way to do it without sed:

#!/bin/bash
for file in `find . -name "*.bed" -maxdepth 1`
do
   title=`basename "${file}" | awk -F'[_.]' '{print $1}'`
   awk -vtitle=${title} '{print $0"\t"title;}' ${file} > ${file}.edited.bed
done
bedops --everything *.edited.bed > union.bed

ADD COMMENT • link 7.2 years ago by Alex Reynolds 35k

0

Entering edit mode

I am getting a warning find: warning: you have specified the -maxdepth option after a non-option argument -name, but options are not positional (-maxdepth affects tests specified before it as well as those specified after it). Please specify options before other arguments.

ADD REPLY • link 7.2 years ago by Debbie ▴ 10

0

Entering edit mode

Perhaps move -maxdepth 1 before -name, so:

for file in `find . -maxdepth 1 -name "*.bed"`

etc.

ADD REPLY • link 7.2 years ago by Alex Reynolds 35k

0

Entering edit mode

Hi, It worked despite the error. Thank you, got the result I needed.

ADD REPLY • link 7.2 years ago by Debbie ▴ 10