Question

[Resolved] Help in Data management : regrouping result

0

Entering edit mode

8.4 years ago

giroudpaul ▴ 70

Hello !

I am trying to have a file that concatenate the output from Fimo with the ones from Homer annotatePeaks. In the end, I want to have one big tab separated file with the peak data, the sequence that matched my consensus motif and then homer annotation.

This is my fimo output (note, I also have it with chr/start/end position; but I can't have both on the same file)

#name   strand   score   p-value    q-valut   match_seq
MACS_peak_625    -    25.5902    1.93e-09    0.00498    GAGTTCACCGAGTTCA
MACS_peak_4881    +    25.4426    2.41e-09    0.00498    GAGTTCACTGAGTTCA
MACS_peak_16939    +    25.0984    3.08e-09    0.00498    GAGTTCATAGAGTTCA
MACS_peak_6882    -    25.0984    3.08e-09    0.00498    GAGTTCATAGAGTTCA
MACS_peak_4617    -    25.0984    3.08e-09    0.00498    GAGTTCATAGAGTTCA
MACS_peak_6695    +    25.0164    3.51e-09    0.00498    GAGTTCACTGGGTTCA
MACS_peak_14937    +    24.5902    4.36e-09    0.00514    GGGTTCACTGGGTTCA
MACS_peak_16708    +    24.2295    4.84e-09    0.00514    GAGTTCACAGAGTTCA

And this is my annotation file of my .bed

PeakID   Chr    Start    End    Strand    Peak Score    Rest_of_annotation_columns
MACS_peak_9638    chr17    39985383    39985583    +    529.00   ...

Is there a way to "grep" and concatenate both files so that I have:

MACS_peak_X    chrX    start    end    strand   score(frombed)  match_seq    score(fromfimo)    p-value q-value    rest_of_homer_annotation...

grep fimo bed homer • 1.7k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.4 years ago by giroudpaul ▴ 70

score 3 · Answer 1 · 2015-11-25

3

Entering edit mode

8.4 years ago

GenoMax 141k

Disclaimer: Column headers are going to get messed up in this solution (or you could remove them beforehand and then run the command).

$ join <( sort fimo_output) <(sort bed_file) | tac | awk '{print $1,$7,$8,$9,$11,$6,$3,$4,$5}' | column -t

Field numbers in the awk part will have to be changed based on how many additional columns you have in your bed file. You should be able to figure that part out.

Based on a solution here.

ADD COMMENT • link 8.4 years ago by GenoMax 141k

0

Entering edit mode

thank you, it worked !

ADD REPLY • link 8.4 years ago by giroudpaul ▴ 70