from txt to bed
2
0
Entering edit mode
7.1 years ago
dimitrischat ▴ 210

hello again. I downloaded from GEO some files in .txt format chrx:11111-2222 and mm8 genome version. I opened the txt file with excel and copy all the document - all the chrx's, then i used Ucsc liftover and pasted it in the box and also changing it from mm8 to mm9. then i get bed file but it is again in this format chrx:11111-2222. i know that in bed format they have to be separated. how do i change that ( new ) bed file now to usable bed one ? i hope i make some sense..

ChIP-Seq • 6.9k views
ADD COMMENT
0
Entering edit mode

When you download the file from UCSC it should already be in tab separated BED format. Are you not able to use the file as is?

If you did something with it in excel then make sure you save it as "tab delimited text" format.

ADD REPLY
0
Entering edit mode

I downloaded from GEO

The GEO supplementary data comes in a multitude of formats.

ADD REPLY
9
Entering edit mode
7.1 years ago
A. Domingues ★ 2.7k

You got plenty of things mixed up:

  1. mm8 and mm9 are not formats. These are genome version, specifically of Mus musculus.
  2. the format chrx:11111-2222 is not BED, so you will need to convert that to chrx 11111 2222. I assume you don't know how to use the command-line to do this? If you don't, use the galaxy tool Convert delimiters to TAB.
  3. I am assuming nothing gets converted from mm8 -> mm9 because the file format is not correct, but I am not sure. Anyway, convert the coordinates to bed first, and then do the mm8 -> mm9 conversion.

Edit:

Since you are learning how to use the command-line, say your file is file.txt.gz:

## test
echo chrx:11111-2222 | sed 's/:/\t/' | sed 's/-/\t/'

# gz file
zcat file.txt.gz | sed 's/:/\t/g' | sed 's/-/\t/g' > file.bed

# uncompressed file
cat file.txt | sed 's/:/\t/g' | sed 's/-/\t/g' > file.bed

That should work.


Edit2: apparently OSX (and other shells) has different ideas when it comes to sed. See comments for from StackOverflow solutions.


Another note: please format your question, it is very hard to read and understand. If you make our job hard, that is the people helping, you are less likely to get an answer.

ADD COMMENT
0
Entering edit mode

1.yes its genome versions. i know, wrong usage of word format. Yea i download .txt.gz files but in the ucsc liftover you can insert chrz:1111-2222 by pasting all the chrx's ( i think ). 2. i know how to use terminal, command line ( now starting to learn ). is there a command for this ? 3. maybe i am not sure about that also.

ADD REPLY
1
Entering edit mode

See my edited answer. The answer assumes access to a Unix system (OSX or Linux).

ADD REPLY
0
Entering edit mode

now i get from this : chr1:4842133-4842148 - > this : chr1t4842133t4842148. chr start stop should be in separated columns

ADD REPLY
1
Entering edit mode

Depending on your system, one of these solutions should work.

ADD REPLY
0
Entering edit mode

thanks a lot! much appreciated !!

ADD REPLY
2
Entering edit mode

If this solution has solved your problem then go ahead and accept it (green check mark) to provide closure for this thread.

ADD REPLY
3
Entering edit mode
7.1 years ago

Using sed is problematic because it isn't portable between GNU and BSD versions. You might use awk instead:

$ awk -F"[:-]" 'BEGIN{ OFS="\t"; }{ print $1, $2, $3; }' in.txt > out.bed

For example:

$ echo chrx:11111-2222 | awk -F"[:-]" 'BEGIN{ OFS="\t"; }{ print $1, $2, $3; }'
chrx    11111   2222
ADD COMMENT

Login before adding your answer.

Traffic: 2698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6