Question: Parsing AME tsv file
0
gravatar for rbronste
19 months ago by
rbronste360
rbronste360 wrote:

I am trying to find a quick and easy way to parse an AME generated true positive sequences.tsv file to pull out just a 3 column BED, the format look as follows, any ideas would be awesome thanks!

motif_DB  motif_ID  seq_ID   FASTA_score PWM_score  class

Jaspar  MA0004.1    chr5:144788829-144789179_shuf_2     2183    12.7135 fp
Jaspar  MA0004.1    chr5:112339537-112339887_shuf_1     1713    12.7131  tp
Jaspar  MA0004.1    chr16:94739915-94740265_shuf_1      1668    12.712   tp
bed motif meme tsv ame • 421 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by rbronste360
3
gravatar for Pierre Lindenbaum
19 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:
grep -v ^motif in.tsv | cut -f 3 | cut -d '_' -f 1 | tr ":-" "\t"
ADD COMMENTlink written 19 months ago by Pierre Lindenbaum128k

Thanks very helpful! Is there additionally a way to include only the true-positive sequences (tp in final column) in the output bed?

ADD REPLYlink written 19 months ago by rbronste360
1

That will be another grep or awk in the command :)
I think you can figure out how to do that?

ADD REPLYlink written 19 months ago by WouterDeCoster43k

Maybe a hint? :) Not as familiar with awk, though trying to learn.

ADD REPLYlink written 18 months ago by rbronste360
1

I would add another grep to get lines with tp, prior to cut.

ADD REPLYlink written 18 months ago by WouterDeCoster43k

Ok figured it out seems to work like this for true positive intervals with specific motifs IDs:

grep -v ^motif sequences.tsv | grep -w tp | grep -w MA0258.2 | cut -f 2,3,6 | cut -d '_' -f 1 | tr ":-" "\t" | head

MA0258.2    chr12   15566967    15567317    tp
MA0258.2    chr11   88155633    88155983    tp
MA0258.2    chr15   51071410    51071760    tp
MA0258.2    chr14   22151488    22151838    tp

Thanks for your help.

ADD REPLYlink modified 18 months ago • written 18 months ago by rbronste360

Thanks very helpful!

Used this to get the following:

grep -v ^motif sequences.tsv | cut -f 2,3  | cut -d '_' -f 1 | tr ":-" "\t" 

    MA0258.2    chr12   15566967    15567317    tp
    MA0258.2    chr11   88155633    88155983    tp
    MA0258.2    chr15   51071410    51071760    tp
    MA0258.2    chr14   22151488    22151838    tp

However can't quite figure out how to select only specific motif_IDs in the .tsv file as well as only tp (true positive) values for those specific motif IDs.

ADD REPLYlink modified 18 months ago • written 18 months ago by rbronste360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1223 users visited in the last hour