how to convert a txt file to bed file
1
0
Entering edit mode
3.6 years ago
evafinegan • 0

Hello,

I have a text file and I want to convert it to a bed file. My text file looks like:

S_12g0010.2.1:0-170
S_12g0030.2.1:0-476
S_12g0040.2.1:0-94
S_12g0080.2.1:0-259
S_12g0020.2.1:0-382

Can anyone suggest a way for this conversion. Thank you!

next-gen • 1.3k views
ADD COMMENT
1
Entering edit mode

Can anyone suggest a way for this conversion.

tr

ADD REPLY
1
Entering edit mode
3.6 years ago
JC 13k

First, check the format definition

Second, use [tr, sed, perl] to convert this, like:

perl -pe 's/:/\t/; s/-(\d+)$/\t$1/' < in.txt < in.txt > out.bed

Edit: updated to support "-" in sequence name

ADD COMMENT
0
Entering edit mode

Thank you, I have used your suggestion. It made a bed file but I see some strange things in some of the lines:

S_12g00500.1.1  0   273
S_12g0000.1 c2.1    0-210

I think the second line should look like this:

  S_12g0000.1c2.1   0 210

The original line in txt file looks like this:

S_12g0000.1-c2.1:0-210
ADD REPLY
0
Entering edit mode

are you sure you use the same input/command?

$ cat > in.txt
S_12g0010.2.1:0-170
S_12g0030.2.1:0-476
S_12g0040.2.1:0-94
S_12g0080.2.1:0-259
S_12g0020.2.1:0-382

$ perl -pe "s/:/\t/; s/-/\t/" < in.txt
S_12g0010.2.1   0       170
S_12g0030.2.1   0       476
S_12g0040.2.1   0       94
S_12g0080.2.1   0       259
S_12g0020.2.1   0       382
ADD REPLY
0
Entering edit mode

I see you have a "-" in the sequence id, to avoid that you can use:

perl -pe "s/:/\t/; s/-(\d+)$/\t$1/"
ADD REPLY
0
Entering edit mode

Using this the output is:

S_12g0010.2.1   0       
S_12g0030.2.1   0      
S_12g0040.2.1   0       
S_12g0080.2.1   0       
S_12g0020.2.1   0       
S_12g0000.1-c2.1    0

Now there is no third column,

ADD REPLY
0
Entering edit mode

This works: perl -pe "s/:/\t/; s/-(\d+)$/\t\1/" Thank you!

ADD REPLY
0
Entering edit mode

Yes, I have used the same command that you suggested. So my input text file has some of the lines like this:

S_12g0010.2.1:0-170
S_12g0030.2.1:0-476
S_12g0040.2.1:0-94
S_12g0080.2.1:0-259
S_12g0020.2.1:0-382    
S_12g0000.1-c2.1:0-210

Your suggestion works well for rest of the lines except the last line where I get output like:

S_12g0010.2.1   0       170
S_12g0030.2.1   0       476
S_12g0040.2.1   0       94
S_12g0080.2.1   0       259
S_12g0020.2.1   0       382
S_12g0000.1 c2.1    0-210
ADD REPLY
1
Entering edit mode
$ cat > in.txt
S_12g0010.2.1:0-170
S_12g0030.2.1:0-476
S_12g0040.2.1:0-94
S_12g0080.2.1:0-259
S_12g0020.2.1:0-382
S_12g0000.1-c2.1:0-210
$ perl -pe 's/:/\t/; s/-(\d+)$/\t$1/' < in.txt
S_12g0010.2.1   0       170
S_12g0030.2.1   0       476
S_12g0040.2.1   0       94
S_12g0080.2.1   0       259
S_12g0020.2.1   0       382
S_12g0000.1-c2.1        0       210
ADD REPLY
0
Entering edit mode

@JC: Please update your answer with the one that accounts for -s in seq IDs

@evafinegan: Once JC updates it, please accept the answer using the green check mark below the upvote button:

Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6