Question: Parsing AME tsv file
0
gravatar for rbronste
12 weeks ago by
rbronste230
rbronste230 wrote:

I am trying to find a quick and easy way to parse an AME generated true positive sequences.tsv file to pull out just a 3 column BED, the format look as follows, any ideas would be awesome thanks!

motif_DB  motif_ID  seq_ID   FASTA_score PWM_score  class

Jaspar  MA0004.1    chr5:144788829-144789179_shuf_2     2183    12.7135 fp
Jaspar  MA0004.1    chr5:112339537-112339887_shuf_1     1713    12.7131  tp
Jaspar  MA0004.1    chr16:94739915-94740265_shuf_1      1668    12.712   tp
bed motif meme tsv ame • 178 views
ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by rbronste230
3
gravatar for Pierre Lindenbaum
12 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:
grep -v ^motif in.tsv | cut -f 3 | cut -d '_' -f 1 | tr ":-" "\t"
ADD COMMENTlink written 12 weeks ago by Pierre Lindenbaum116k

Thanks very helpful! Is there additionally a way to include only the true-positive sequences (tp in final column) in the output bed?

ADD REPLYlink written 12 weeks ago by rbronste230
1

That will be another grep or awk in the command :)
I think you can figure out how to do that?

ADD REPLYlink written 12 weeks ago by WouterDeCoster36k

Maybe a hint? :) Not as familiar with awk, though trying to learn.

ADD REPLYlink written 10 weeks ago by rbronste230
1

I would add another grep to get lines with tp, prior to cut.

ADD REPLYlink written 10 weeks ago by WouterDeCoster36k

Ok figured it out seems to work like this for true positive intervals with specific motifs IDs:

grep -v ^motif sequences.tsv | grep -w tp | grep -w MA0258.2 | cut -f 2,3,6 | cut -d '_' -f 1 | tr ":-" "\t" | head

MA0258.2    chr12   15566967    15567317    tp
MA0258.2    chr11   88155633    88155983    tp
MA0258.2    chr15   51071410    51071760    tp
MA0258.2    chr14   22151488    22151838    tp

Thanks for your help.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by rbronste230

Thanks very helpful!

Used this to get the following:

grep -v ^motif sequences.tsv | cut -f 2,3  | cut -d '_' -f 1 | tr ":-" "\t" 

    MA0258.2    chr12   15566967    15567317    tp
    MA0258.2    chr11   88155633    88155983    tp
    MA0258.2    chr15   51071410    51071760    tp
    MA0258.2    chr14   22151488    22151838    tp

However can't quite figure out how to select only specific motif_IDs in the .tsv file as well as only tp (true positive) values for those specific motif IDs.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by rbronste230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1000 users visited in the last hour