Entering edit mode
                    7.0 years ago
        rbronste
        
    
        ▴
    
    420
    I am trying to find a quick and easy way to parse an AME generated true positive sequences.tsv file to pull out just a 3 column BED, the format look as follows, any ideas would be awesome thanks!
motif_DB  motif_ID  seq_ID   FASTA_score PWM_score  class
Jaspar  MA0004.1    chr5:144788829-144789179_shuf_2     2183    12.7135 fp
Jaspar  MA0004.1    chr5:112339537-112339887_shuf_1     1713    12.7131  tp
Jaspar  MA0004.1    chr16:94739915-94740265_shuf_1      1668    12.712   tp
                    
                
                
Thanks very helpful! Is there additionally a way to include only the true-positive sequences (tp in final column) in the output bed?
That will be another grep or awk in the command :)
I think you can figure out how to do that?
Maybe a hint? :) Not as familiar with awk, though trying to learn.
I would add another grep to get lines with
tp, prior tocut.Ok figured it out seems to work like this for true positive intervals with specific motifs IDs:
Thanks for your help.
Thanks very helpful!
Used this to get the following:
However can't quite figure out how to select only specific motif_IDs in the .tsv file as well as only tp (true positive) values for those specific motif IDs.