Blastn for converting the circular IDs
0
0
Entering edit mode
8 weeks ago
seta ★ 1.7k

Dear all,

I’m analyzing the GSE207093 (circular RNA microarray dataset) that is generated by Arraystar platform. As I did not find any way for converting the probe Arraystar ID to circAtlas ID, I used blastn of the probe sequence of Arraystar (60 bp in length) against the circular RNA database of circAtlas (sequence length range: 31-2000 bp). But I have a question about the blastn output

I used the below command:

blastn -task blastn-short -query rat_seq.fa -db circ_db -out out.txt -evalue 1e-5 -
max_target_seqs 1 -num_threads 4 -outfmt 6


with the above command, one of my circRNAs of interest (rno_circRNA_014621) matched with rno-Ralgapa1_0048.

    query    subject     %id     alignment length    mismatches  gap openings    query start
query end    subject start   subject end     Evalue  bit score
rno_circRNA_014621  rno-Ralgapa1_0048   100 37  0   0   1   37  549 585 1.02E-13    73.8


However, when I used max_target_seqs 10 instead of 1, I obtained 10 sequenced matched with rno_circRNA_014621, which identity, Evalue, and bit score is the same for all of them.

query    subject     %id     alignment length    mismatches  gap openings    query start     query end   subject start   subject end     Evalue  bit score
rno_circRNA_014621  rno-Ralgapa1_0048   100 37  0   0   1   37  549 585 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0045   100 37  0   0   1   37  549 585 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0019   100 37  0   0   1   37  549 585 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0040   100 37  0   0   1   37  197 233 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0038   100 37  0   0   1   37  204 240 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0050   100 37  0   0   1   37  120 156 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0035   100 37  0   0   1   37  204 240 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0054   100 37  0   0   1   37  549 585 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0028   100 37  0   0   1   37  549 585 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0006   100 37  0   0   1   37  549 585 1.02E-13    73.8


I used the below command for extracting the best hit from the above results:

sort -k1,1 -k12,12gr -k11,11g -k3,3gr blastout.txt | sort -u -k1,1 --merge > bestHits


that returned rno-Ralgapa1_0006 as the best hit for the rno_circRNA_014621. My question is why other hits, something like rno-Ralgapa1_0048 or rno-Ralgapa1_0035 is not the best hit? how I can ensure the blast output, actually the obtained circAtlas IDs, are correct?

Sharing any suggestion for obtaining the right CircAtlas Id would be highly appreciated.

Thanks

blastn microarray RNA circular-ID • 134 views