Question: [Resolved] Help in Data management : regrouping result
0
gravatar for giroudpaul
5.2 years ago by
giroudpaul70
European Union
giroudpaul70 wrote:

Hello !

I am trying to have a file that concatenate the output from Fimo with the ones from Homer annotatePeaks. In the end, I want to have one big tab separated file with the peak data, the sequence that matched my consensus motif and  then homer annotation.
 

This is my fimo output (note, I also have it with chr/start/end position; but I can't have both on the same file)

#name   strand   score   p-value    q-valut   match_seq
MACS_peak_625    -    25.5902    1.93e-09    0.00498    GAGTTCACCGAGTTCA
MACS_peak_4881    +    25.4426    2.41e-09    0.00498    GAGTTCACTGAGTTCA
MACS_peak_16939    +    25.0984    3.08e-09    0.00498    GAGTTCATAGAGTTCA
MACS_peak_6882    -    25.0984    3.08e-09    0.00498    GAGTTCATAGAGTTCA
MACS_peak_4617    -    25.0984    3.08e-09    0.00498    GAGTTCATAGAGTTCA
MACS_peak_6695    +    25.0164    3.51e-09    0.00498    GAGTTCACTGGGTTCA
MACS_peak_14937    +    24.5902    4.36e-09    0.00514    GGGTTCACTGGGTTCA
MACS_peak_16708    +    24.2295    4.84e-09    0.00514    GAGTTCACAGAGTTCA

 

And this is my annotation file of my .bed

PeakID   Chr    Start    End    Strand    Peak Score    Rest_of_annotation_columns
MACS_peak_9638    chr17    39985383    39985583    +    529.00   ...

is there a way to "grep" and concatenate both file ?

so that I have :

MACS_peak_X    chrX    start    end    strand   score(frombed)  match_seq    score(fromfimo)    p-value q-value    rest_of_homer_annotation...
homer bed grep fimo • 1.1k views
ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by giroudpaul70
3
gravatar for GenoMax
5.2 years ago by
GenoMax95k
United States
GenoMax95k wrote:

Disclaimer: Column headers are going to get messed up in this solution (or you could remove them beforehand and then run the command).

$ join <( sort fimo_output) <(sort bed_file) | tac | awk '{print $1,$7,$8,$9,$11,$6,$3,$4,$5}' | column -t

Field numbers in the awk part will have to be changed based on how many additional columns you have in your bed file. You should be able to figure that part out.

Based on a solution here.

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by GenoMax95k

thank you, it worked !

ADD REPLYlink written 5.2 years ago by giroudpaul70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1112 users visited in the last hour
_