Question: How To Convert A Basic Bed File With Only 3 Columns (Chrname, Start, End Site) Into A Bigger Bed With 6 Columns
0
gravatar for Hamilton
8.0 years ago by
Hamilton280
Hamilton280 wrote:

Hi,

My bed file has only 3 columns, chr name, start, and end. but for macs in galaxy, it requires a bed file with 6 columns. how can i convert?

macs bed chip-seq • 4.5k views
ADD COMMENTlink modified 24 months ago by Biostar ♦♦ 20 • written 8.0 years ago by Hamilton280

question is not clear...can you give examples input and output...and what additional columns do you want to add?.

ADD REPLYlink written 8.0 years ago by Rm7.9k
4
gravatar for Gjain
8.0 years ago by
Gjain5.4k
Munich, Germany
Gjain5.4k wrote:

well if you have no extra information, then you can add name, score, strand column which are basically your column4, column5 and column6 by adding .(dot) for column4, 0(zero) for column5 and +(strand) for column6.

If you can give an example of what kind of data is available to you, I can modify my answer to have the correct name, score and strand.

ADD COMMENTlink written 8.0 years ago by Gjain5.4k
3
gravatar for Wolf
8.0 years ago by
Wolf130
Wolf130 wrote:

MACS uses strand information (which would be in column 6) for the fragment size model it builds. If you want to use MACS and you expect your peaks to be narrow (i.e. you would want to use the model building step), I think you should try to get the strand information from whatever aligner you used into your bed file. Without knowing more, I can't help you with how to do that.

If you don't have the strand information (i.e. if you made them all + strand), you have to use the [?]--nomodel[?] option.

ADD COMMENTlink written 8.0 years ago by Wolf130

i have got this bed file from author of wang et al 2011 pnas paper as it is publicly available. initially, it has only 3 columns. what if i add up . for col4 , 0 for col5, + for col6 assuming that i dont have any extra information for that but i only know basic information as Gjain suggested?? this can give any bias result??

ADD REPLYlink written 8.0 years ago by Hamilton280

it means that you can't ask MACS to estimate ChIP fragment size from the data (i.e. use --no-model). Usually you would use the fragment size to shift/extend plus strand reads to the right and minus strand reads to the left, so that the cover the actual binding site that was pulled down. You won't be able to do this, so you should set the shift size to 0. That will reduce your resolution somewhat, but depending on what you are planning on doing, it might still be ok.

ADD REPLYlink written 8.0 years ago by Wolf130
2
gravatar for Fidel
4.2 years ago by
Fidel1.9k
Germany
Fidel1.9k wrote:

I think this should work

cat bed3.bed | perl -lane 'print "$F[0]\t$F[1]\t$F[2]\t.\t0\t."' > bed6.bed

 

ADD COMMENTlink written 4.2 years ago by Fidel1.9k
0
gravatar for cpad0112
4.2 years ago by
cpad011212k
India
cpad011212k wrote:

see if this works:

awk -v OFS='\t' '{print $1,$2,$3,".",".","."}' bed3.txt > bed6.txt

If last three columns are to be empty, instead of ".", this may work:

awk -v OFS='\t' '{print $1,$2,$3,"","","",""}' bed3.txt  > bed6.txt
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by cpad011212k

In theory, the 5th column should be an score from 0 to 1000 (see https://genome.ucsc.edu/FAQ/FAQformat.html#format1), that's why 0 is better than '.'

ADD REPLYlink written 4.2 years ago by Fidel1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1866 users visited in the last hour