Renaming and editing a bed file
1
0
Entering edit mode
7.2 years ago
Debbie ▴ 10

have around 80 bed files with 1st 3 columns (example : X2_example.bed, where X2 is the gene name) and I want to add a 4th column with gene name and rename the file (example attached: X2_example_edited.bed, Y2_example_edited.bed and so on..), and then merge these files together to create 1 bed file.

I can add the 4th column with gene name and save the file with a different name with the code

sed 's/$/\tX2/' < X2_example.bed > X2_example_edited.bed

This is the generated bed file

chr17 42276210 42276219 X2

chr17 42297938 42297947 X2

chr17 42276210 42276219 X2

chr17 42297938 42297947 X2

But I have to do this separately for each bed file. This there a way I can extract the gene name from the name of the file (eg. X2 from X2_example.bed) and then add that to the 4th column of the bed file and save it as X2_example_edited.bed.

I can extract the gene name from the file name echo "X2_example.bed" | awk -F'[_.]' '{print $1}

However, as I have too many files I am looking for a way to generate a loop to automate this.

Also I need to merge all the generated bed file which I can do by

cat *_edited.bed >output.bed

However, I am having an error (see attached example: output.bed), the last line of 1st file and 1st line of next file are on same line.

chr3 18467066 18467075 Y2chr17 42276210 42276219 X2

I know this must be a very basic thing, but I am new to this analysis and have limited knowledge. Thanks in advance

ChIP-Seq Bed sed awk • 2.8k views
ADD COMMENT
0
Entering edit mode
7.2 years ago

Here's a way to do this with BEDOPS bedops --everything and a for loop in bash:

Note that the blank space in the sed command before ${title} is a literal tab — press Control-V and then the Tab character:

#!/bin/bash
for file in `find . -name "*.bed" -maxdepth 1`
do
   title=`basename "${file}" | awk -F'[_.]' '{print $1}'`
   sed 's/$/'"   ${title}"'/' ${file} > ${file}.edited.bed
done
bedops --everything *.edited.bed > union.bed

Here's another way to do it without sed:

#!/bin/bash
for file in `find . -name "*.bed" -maxdepth 1`
do
   title=`basename "${file}" | awk -F'[_.]' '{print $1}'`
   awk -vtitle=${title} '{print $0"\t"title;}' ${file} > ${file}.edited.bed
done
bedops --everything *.edited.bed > union.bed
ADD COMMENT
0
Entering edit mode

I am getting a warning find: warning: you have specified the -maxdepth option after a non-option argument -name, but options are not positional (-maxdepth affects tests specified before it as well as those specified after it). Please specify options before other arguments.

ADD REPLY
0
Entering edit mode

Perhaps move -maxdepth 1 before -name, so:

for file in `find . -maxdepth 1 -name "*.bed"`

etc.

ADD REPLY
0
Entering edit mode

Hi, It worked despite the error. Thank you, got the result I needed.

ADD REPLY

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6