No. of blast hits uploaded in Blast2GO does not match with the actual number of hits in the diamond output xml file
0
1
Entering edit mode
4.9 years ago
MSM55 ▴ 150

I have executed diamond blastx for 150062 sequences with .xml file as output. However, when I load the file into Blast2GO, only 69032 got uploaded into Blast2GO. While troubleshooting, I got to know that the .xml file itself contains unrecognisable characters (as shown in the below image) which could be the probable reason of this.

This issue is somewhat related to the issue mentioned on the diamond github page here, however, the solution mentioned there is not helping me.

How to get rid of these symbols?

diamond blastx blast2go .xml • 1.7k views
0
Entering edit mode

Maybe your unwanted character is different from the one reported on that issue? Instead of pasting an image, paste a snippet of the output, maybe then we can identify your unwanted characters - I can't read it on the image.

0
Entering edit mode

what is the output of

cat your.blast.xml |  tr "><" "\n" | grep NP_267385 -m1 |  hexdump -C | head -n 20

0
Entering edit mode

here is the output

00000000  52 69 64 41 20 66 61 6d  69 6c 79 20 70 72 6f 74  |RidA family prot|
00000010  65 69 6e 20 5b 4c 61 63  74 6f 63 6f 63 63 75 73  |ein [Lactococcus|
00000020  20 6c 61 63 74 69 73 5d  01 4e 50 5f 32 36 37 33  | lactis].NP_2673|
00000030  38 35 2e 31 20 41 6c 64  52 20 5b 4c 61 63 74 6f  |85.1 AldR [Lacto|
00000040  63 6f 63 63 75 73 20 6c  61 63 74 69 73 20 73 75  |coccus lactis su|
00000050  62 73 70 2e 20 6c 61 63  74 69 73 20 49 6c 31 34  |bsp. lactis Il14|
00000060  30 33 5d 01 4f 33 34 31  33 33 2e 32 20 52 65 63  |03].O34133.2 Rec|
00000070  4e 61 6d 65 3a 20 46 75  6c 6c 3d 50 75 74 61 74  |Name: Full=Putat|
00000080  69 76 65 20 72 65 67 75  6c 61 74 6f 72 20 41 6c  |ive regulator Al|
00000090  64 52 01 41 41 4b 30 35  33 32 37 2e 31 20 72 65  |dR.AAK05327.1 re|
000000a0  67 75 6c 61 74 6f 72 79  20 70 72 6f 74 65 69 6e  |gulatory protein|
000000b0  20 41 6c 64 52 20 5b 4c  61 63 74 6f 63 6f 63 63  | AldR [Lactococc|
000000c0  75 73 20 6c 61 63 74 69  73 20 73 75 62 73 70 2e  |us lactis subsp.|
000000d0  20 6c 61 63 74 69 73 20  49 6c 31 34 30 33 5d 01  | lactis Il1403].|
000000e0  41 44 5a 36 33 38 34 30  2e 31 20 74 72 61 6e 73  |ADZ63840.1 trans|
000000f0  6c 61 74 69 6f 6e 20 69  6e 69 74 69 61 74 69 6f  |lation initiatio|
00000100  6e 20 69 6e 68 69 62 69  74 6f 72 20 5b 4c 61 63  |n inhibitor [Lac|
00000110  74 6f 63 6f 63 63 75 73  20 6c 61 63 74 69 73 20  |tococcus lactis |
00000120  73 75 62 73 70 2e 20 6c  61 63 74 69 73 20 43 56  |subsp. lactis CV|
00000130  35 36 5d 01 45 48 45 39  33 33 37 39 2e 31 20 68  |56].EHE93379.1 h|