Question: Parsing AME tsv file
0
gravatar for rbronste
27 days ago by
rbronste200
rbronste200 wrote:

I am trying to find a quick and easy way to parse an AME generated true positive sequences.tsv file to pull out just a 3 column BED, the format look as follows, any ideas would be awesome thanks!

motif_DB  motif_ID  seq_ID   FASTA_score PWM_score  class

Jaspar  MA0004.1    chr5:144788829-144789179_shuf_2     2183    12.7135 fp
Jaspar  MA0004.1    chr5:112339537-112339887_shuf_1     1713    12.7131  tp
Jaspar  MA0004.1    chr16:94739915-94740265_shuf_1      1668    12.712   tp
bed motif meme tsv ame • 135 views
ADD COMMENTlink modified 27 days ago • written 27 days ago by rbronste200
3
gravatar for Pierre Lindenbaum
27 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum114k wrote:
grep -v ^motif in.tsv | cut -f 3 | cut -d '_' -f 1 | tr ":-" "\t"
ADD COMMENTlink written 27 days ago by Pierre Lindenbaum114k

Thanks very helpful! Is there additionally a way to include only the true-positive sequences (tp in final column) in the output bed?

ADD REPLYlink written 27 days ago by rbronste200
1

That will be another grep or awk in the command :)
I think you can figure out how to do that?

ADD REPLYlink written 27 days ago by WouterDeCoster34k

Maybe a hint? :) Not as familiar with awk, though trying to learn.

ADD REPLYlink written 10 days ago by rbronste200
1

I would add another grep to get lines with tp, prior to cut.

ADD REPLYlink written 10 days ago by WouterDeCoster34k

Ok figured it out seems to work like this for true positive intervals with specific motifs IDs:

grep -v ^motif sequences.tsv | grep -w tp | grep -w MA0258.2 | cut -f 2,3,6 | cut -d '_' -f 1 | tr ":-" "\t" | head

MA0258.2    chr12   15566967    15567317    tp
MA0258.2    chr11   88155633    88155983    tp
MA0258.2    chr15   51071410    51071760    tp
MA0258.2    chr14   22151488    22151838    tp

Thanks for your help.

ADD REPLYlink modified 10 days ago • written 10 days ago by rbronste200

Thanks very helpful!

Used this to get the following:

grep -v ^motif sequences.tsv | cut -f 2,3  | cut -d '_' -f 1 | tr ":-" "\t" 

    MA0258.2    chr12   15566967    15567317    tp
    MA0258.2    chr11   88155633    88155983    tp
    MA0258.2    chr15   51071410    51071760    tp
    MA0258.2    chr14   22151488    22151838    tp

However can't quite figure out how to select only specific motif_IDs in the .tsv file as well as only tp (true positive) values for those specific motif IDs.

ADD REPLYlink modified 10 days ago • written 11 days ago by rbronste200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1322 users visited in the last hour