Question: how to keep column 6 (normalized tag count) in peaks.txt file called by Homer callpeaks after pos2bed manipulation?
0
gravatar for Ming Lu
8 months ago by
Ming Lu0
Australia
Ming Lu0 wrote:

I use HOMER to call peaks getting peaks.txt file. Then I use pos2bed.pl to transform peaks.txt to peaks.bed However, the column 6 loss after the transform, which showed the normalized tag count (equal to RPKM reflecting peak density).

chip-seq • 457 views
ADD COMMENTlink modified 8 months ago by prakash120 • written 8 months ago by Ming Lu0
1
gravatar for prakash
8 months ago by
prakash120
prakash120 wrote:

simple "grep" and "awk" can do your job.

grep -v "#" peak.txt |cut -f 1,2,3,4,6 | awk '{print $2"\t"$3"\t"$4"\t"$1"\t"$5}' >peak.bed

ADD COMMENTlink modified 8 months ago • written 8 months ago by prakash120

Thank you, I use these code, and the column 6 will be kept after bedtools intersect

cut -f 1,2,3,4,6 peaks.txt | awk '{print $2"\t"$3"\t"$4"\t"$1"\t"$5}' >peak1.bed
pos2bed.pl peaks.txt > peak2.bed
awk 'NR==FNR {h[$4] = $5; next} {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"h[$4]}' peak1.bed peak2.bed >peaks.bed
chr11   117467921   117468098   chr11-2 1   +   86.8
chr17   39636555    39636732    chr17-2 1   +   85.6
chr2    231281278   231281455   chr2-2  1   +   83.3

but still 1 questions: "#" mean any pattern I can input, is that right? I didnot use it

ADD REPLYlink modified 8 months ago • written 8 months ago by Ming Lu0
1

but still 1 questions: "#" mean any pattern I can input, is that right? I didnot use it

yes, within double quote, you can use any pattern. in this case, line with comment in peak file i.e "#" is not required, so to filter it, "grep -v "#" has been used.

ADD REPLYlink written 8 months ago by prakash120

why we have to clear lines with #, which didnot impact the intersect manipulation and result? even in homer's pos2bed.pl .txt >.bed, the new .bed file keeps the lines with #

ADD REPLYlink written 7 months ago by Ming Lu0
1

you can further shorten the code:

cut -f 1-6 peaks.txt | awk '{print $2,$3,$4,$1,$5}' OFS="\t"

cut will take range and awk can take delimiter to all columns. IMO, that much code is not necessary. Please try the following:

OP:

cut -f 1,2,3,4,6 peaks.txt | awk '{print $2"\t"$3"\t"$4"\t"$1"\t"$5}' >peak1.bed

New code if you have lines with #:

grep -v "#" peak.txt |cut -f 2-4,1,6 > peak1.bed

New code if you do not have lines with #:

cut -f 2-4,1,6 peak.txt > peak1.bed
ADD REPLYlink modified 8 months ago • written 8 months ago by cpad01126.3k
1

grep -v "#" peak.txt |cut -f 2-4,1,6 > peak1.bed

Actually, using this code, order of column will not be changed. So, yes below shorter code which you mentioned will solve the purpose.

cut -f 1-6 peaks.txt | awk '{print $2,$3,$4,$1,$5}' OFS="\t"
ADD REPLYlink modified 8 months ago • written 8 months ago by prakash120
1

oops...I didn't see 5th column missing.

$ cut -f 1-6 peaks.txt | awk '{print $2,$3,$4,$1,$5}' OFS="\t"

should be

$ cut -f 1-4, 6 peaks.txt | awk '{print $2,$3,$4,$1,$5}' OFS="\t"
ADD REPLYlink modified 8 months ago • written 8 months ago by cpad01126.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1551 users visited in the last hour