Question: parsing tabular blast output
0
gravatar for nh75
2.1 years ago by
nh750
nh750 wrote:

Hello to all!

I have seen that Biopython recommend xml output to parse blast file. I need however to do my blast in -outfmt 7

How to parse my file having a tabular output? What I want is to associate queries (end of line) to each organism hits: input

Query= R1_sam10_filt_denovo_18-02-17_c1    cov=4.55 len=204 gc=41.67 nseq=6
ref|XR_001550329.1|  PREDICTED: Oryza brachyantha beta-glucosidas...  53.6    0.002
ref|XR_001550328.1|  PREDICTED: Oryza brachyantha beta-glucosidas...  53.6    0.002
ref|XM_006654251.2|  PREDICTED: Oryza brachyantha beta-glucosidas...  53.6    0.002
ref|XM_006654251.1|  PREDICTED: Oryza brachyantha beta-glucosidas...  53.6    0.002
emb|LN590686.1|  Cyprinus carpio genome assembly common carp geno...  48.2    0.064
Query= R1_sam10_filt_denovo_18-02-17_c2    cov=4.54 len=198 gc=52.78 nseq=6
emb|LN590686.1|  Cyprinus carpio genome assembly common carp geno...  48.2    0.064

output

emb|LN590686.1|  Cyprinus carpio genome assembly common carp geno...   c1     c2
ref|XR_001550329.1|  PREDICTED: Oryza brachyantha beta-glucosidas...   c1

Thanks for your answers!

parsing blast tabular • 714 views
ADD COMMENTlink modified 2.1 years ago by Pierre Lindenbaum117k • written 2.1 years ago by nh750
1
gravatar for Pierre Lindenbaum
2.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum117k wrote:

not the output you asked (but anyway, this is a bad output :-) )

$ awk '/^Query=/ {n=split($2,a,/_/);Q=a[n];next;} {print $1,$2, Q;} ' input.blast

ref|XR_001550329.1| PREDICTED: c1
ref|XR_001550328.1| PREDICTED: c1
ref|XM_006654251.2| PREDICTED: c1
ref|XM_006654251.1| PREDICTED: c1
emb|LN590686.1| Cyprinus c1
emb|LN590686.1| Cyprinus c2
ADD COMMENTlink written 2.1 years ago by Pierre Lindenbaum117k

thank you very much Pierre!! that's almost what I expected! do you have an idea how to 'merge' the 2 last lines of final output and give them 'c1, c2'? (I already grep the original blast output)

ADD REPLYlink written 2.1 years ago by nh750
1

sure: a simple awk ,groupby, datamash ...

but it's usually a bad idea to reformat such data.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Pierre Lindenbaum117k

ok with groupby

bedtools groupby -i output.txt -grp 1 -c 2 -o collapse

what will be the command in awk?

other question related to the output file :

  • how to have tabulation between fields?
  • how to print query information (cov=4.55 len=204 gc=41.67 nseq=6) into a new file?
ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by nh750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1234 users visited in the last hour