Question: how to keep column 6 (normalized tag count) in peaks.txt file called by Homer callpeaks after pos2bed manipulation?
0
gravatar for Ming Lu
11 months ago by
Ming Lu0
Australia
Ming Lu0 wrote:

I use HOMER to call peaks getting peaks.txt file. Then I use pos2bed.pl to transform peaks.txt to peaks.bed However, the column 6 loss after the transform, which showed the normalized tag count (equal to RPKM reflecting peak density).

chip-seq • 526 views
ADD COMMENTlink modified 11 months ago by prakash480 • written 11 months ago by Ming Lu0
1
gravatar for prakash
11 months ago by
prakash480
prakash480 wrote:

simple "grep" and "awk" can do your job.

grep -v "#" peak.txt |cut -f 1,2,3,4,6 | awk '{print $2"\t"$3"\t"$4"\t"$1"\t"$5}' >peak.bed

ADD COMMENTlink modified 11 months ago • written 11 months ago by prakash480

Thank you, I use these code, and the column 6 will be kept after bedtools intersect

cut -f 1,2,3,4,6 peaks.txt | awk '{print $2"\t"$3"\t"$4"\t"$1"\t"$5}' >peak1.bed
pos2bed.pl peaks.txt > peak2.bed
awk 'NR==FNR {h[$4] = $5; next} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"h[$4]}' peak1.bed peak2.bed >peaks.bed
chr11   117467921   117468098   chr11-2 1   +   86.8
chr17   39636555    39636732    chr17-2 1   +   85.6
chr2    231281278   231281455   chr2-2  1   +   83.3

but still 1 questions: "#" mean any pattern I can input, is that right? I didnot use it

ADD REPLYlink modified 11 months ago • written 11 months ago by Ming Lu0
1

but still 1 questions: "#" mean any pattern I can input, is that right? I didnot use it

yes, within double quote, you can use any pattern. in this case, line with comment in peak file i.e "#" is not required, so to filter it, "grep -v "#" has been used.

ADD REPLYlink written 11 months ago by prakash480

why we have to clear lines with #, which didnot impact the intersect manipulation and result? even in homer's pos2bed.pl .txt >.bed, the new .bed file keeps the lines with #

ADD REPLYlink written 10 months ago by Ming Lu0
1

you can further shorten the code:

cut -f 1-6 peaks.txt | awk '{print $2,$3,$4,$1,$5}' OFS="\t"

cut will take range and awk can take delimiter to all columns. IMO, that much code is not necessary. Please try the following:

OP:

cut -f 1,2,3,4,6 peaks.txt | awk '{print $2"\t"$3"\t"$4"\t"$1"\t"$5}' >peak1.bed

New code if you have lines with #:

grep -v "#" peak.txt |cut -f 2-4,1,6 > peak1.bed

New code if you do not have lines with #:

cut -f 2-4,1,6 peak.txt > peak1.bed
ADD REPLYlink modified 11 months ago • written 11 months ago by cpad01128.8k
1

grep -v "#" peak.txt |cut -f 2-4,1,6 > peak1.bed

Actually, using this code, order of column will not be changed. So, yes below shorter code which you mentioned will solve the purpose.

cut -f 1-6 peaks.txt | awk '{print $2,$3,$4,$1,$5}' OFS="\t"
ADD REPLYlink modified 11 months ago • written 11 months ago by prakash480
1

oops...I didn't see 5th column missing.

$ cut -f 1-6 peaks.txt | awk '{print $2,$3,$4,$1,$5}' OFS="\t"

should be

$ cut -f 1-4, 6 peaks.txt | awk '{print $2,$3,$4,$1,$5}' OFS="\t"
ADD REPLYlink modified 11 months ago • written 11 months ago by cpad01128.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1745 users visited in the last hour