This question will seem INCREDIBLY similar to the question here, but it's not.
I ran diamond blastx on a large dataset with the output options -f 6 qseqid sseqid bitscore
This gives me a file that looks like
OHJ07_1_contig_1 ELU09376.1 66.6 OHJ07_1_contig_1 KFM76682.1 65.9 OHJ07_1_contig_1 JAT94707.1 63.2 OHJ07_1_contig_1 JAT94707.1 46.6 OHJ07_1_contig_1 XP_002400485.1 62.4 OHJ07_1_contig_1 XP_002400485.1 46.2 OHJ07_1_contig_1 XP_014787375.1 61.6 OHJ07_1_contig_1 XP_014787375.1 40.4 OHJ07_1_contig_1 KOF67573.1 61.6
And I'd like to get taxIds for each of these accession versions. The previous question, linked above, seems to only work on accession numbers, not accession version numbers.
I have also downloaded the NCBI prot.accession2taxid file, and tried to grep the accession version numbers back to this, but the output is in the order of the NCBI file, not mine. The NCBI file looks like
accession accession.version taxid gi P29373 P29373.2 9606 132401 P22935 P22935.2 10090 132402 P18902 P18902.1 9913 132403 P02753 P02753.3 9606 62298174 P27485 P27485.2 9823 3041715 P06912 P06912.2 9986 1710096
My grep output (head) looks like this
P07201 P07201.2 6584 266918 P21329 P21329.1 7221 134082 P21328 P21328.1 7227 134083 P04571 P04571.1 126592 134316 P04572 P04572.1 6357 134318 P19217 P19217.1 9913 135052 P17248 P17248.3 9913 110283011
so obviously, I can't just paste the grep output with my diamond blast output.
Eventually I need a file that looks like my blast output, but with taxids in the place of accession version numbers, ie.
OHJ07_1_contig_1 283909 66.6 OHJ07_1_contig_1 407821 65.9
The way I see it, there are two approaches, somehow use the accession version numbers with the script provided here by Pierre and Steve, or I can try and sort my grep output from the NCBI file I downloaded so that I can paste the files and continue with the rest of the analysis.
I have tried the latter, to sort my grep file, but my script (below), is hideously slow and will take about 3 months to finish the whole job
while read line; do grep -m 1 $line prot.accession2taxid; done<test>taxIds
Any help would be appreciated!
EDIT: The "test" file I use in the above "solution", is just the column of accession versions pulled out of my blast output file