Modifying NCBI identifiers in BLAST output
1
0
Entering edit mode
4.2 years ago
adityabandla ▴ 30

Hi

Post blastx, I have the alignment results as a .m8 blast tabular file, with lines that look as follows

HISEQ:329:HMKF3BCXX:1:1101:4293:5950/1 gi|753197404|ref|WP_041503856.1| 54.3    81  37  0   6   248 141 221 4.4e-18 99.0

I would like to simplify the NCBI identifiers of the second column i.e. keep only the accession numbers in the blast output file, essentially something like

HISEQ:329:HMKF3BCXX:1:1101:4293:5950/1 WP_041503856.1 54.3  81  37  0   6   248 141 221 4.4e-18 99.0

Thanks

blast alignment sequencing • 807 views
ADD COMMENT
2
Entering edit mode
4.2 years ago
awk -F '\t' '{OFS="\t";split($2,a,/\|/);$2=a[4];print;}' input.tsv > output.tsv
ADD COMMENT
0
Entering edit mode

Thanks Pierre! Much appreciated

On another note, I am trying to simplify the fasta header which contains similar text i.e. gi|753197404|ref|WP_041503856.1|

I am currently removing the gi numbers using

sed 's/^[^ ][|]([^|])[|] .*$/>\1/'

Is there a faster alternative using awk?

ADD REPLY

Login before adding your answer.

Traffic: 1831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6