Entering edit mode
7.1 years ago
nh75
•
0
Hello to all!
I have seen that Biopython recommend xml output to parse blast file. I need however to do my blast in -outfmt 7
How to parse my file having a tabular output? What I want is to associate queries (end of line) to each organism hits: input
Query= R1_sam10_filt_denovo_18-02-17_c1 cov=4.55 len=204 gc=41.67 nseq=6
ref|XR_001550329.1| PREDICTED: Oryza brachyantha beta-glucosidas... 53.6 0.002
ref|XR_001550328.1| PREDICTED: Oryza brachyantha beta-glucosidas... 53.6 0.002
ref|XM_006654251.2| PREDICTED: Oryza brachyantha beta-glucosidas... 53.6 0.002
ref|XM_006654251.1| PREDICTED: Oryza brachyantha beta-glucosidas... 53.6 0.002
emb|LN590686.1| Cyprinus carpio genome assembly common carp geno... 48.2 0.064
Query= R1_sam10_filt_denovo_18-02-17_c2 cov=4.54 len=198 gc=52.78 nseq=6
emb|LN590686.1| Cyprinus carpio genome assembly common carp geno... 48.2 0.064
output
emb|LN590686.1| Cyprinus carpio genome assembly common carp geno... c1 c2
ref|XR_001550329.1| PREDICTED: Oryza brachyantha beta-glucosidas... c1
Thanks for your answers!
thank you very much Pierre!! that's almost what I expected! do you have an idea how to 'merge' the 2 last lines of final output and give them 'c1, c2'? (I already grep the original blast output)
sure: a simple awk ,groupby, datamash ...
but it's usually a bad idea to reformat such data.
ok with groupby
what will be the command in awk?
other question related to the output file :