Blastn for converting the circular IDs
0
0
Entering edit mode
17 months ago
seta ★ 1.9k

Dear all,

I’m analyzing the GSE207093 (circular RNA microarray dataset) that is generated by Arraystar platform. As I did not find any way for converting the probe Arraystar ID to circAtlas ID, I used blastn of the probe sequence of Arraystar (60 bp in length) against the circular RNA database of circAtlas (sequence length range: 31-2000 bp). But I have a question about the blastn output

I used the below command:

blastn -task blastn-short -query rat_seq.fa -db circ_db -out out.txt -evalue 1e-5 - 
max_target_seqs 1 -num_threads 4 -outfmt 6

with the above command, one of my circRNAs of interest (rno_circRNA_014621) matched with rno-Ralgapa1_0048.

    query    subject     %id     alignment length    mismatches  gap openings    query start     
    query end    subject start   subject end     Evalue  bit score
rno_circRNA_014621  rno-Ralgapa1_0048   100 37  0   0   1   37  549 585 1.02E-13    73.8

However, when I used max_target_seqs 10 instead of 1, I obtained 10 sequenced matched with rno_circRNA_014621, which identity, Evalue, and bit score is the same for all of them.

query    subject     %id     alignment length    mismatches  gap openings    query start     query end   subject start   subject end     Evalue  bit score
rno_circRNA_014621  rno-Ralgapa1_0048   100 37  0   0   1   37  549 585 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0045   100 37  0   0   1   37  549 585 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0019   100 37  0   0   1   37  549 585 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0040   100 37  0   0   1   37  197 233 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0038   100 37  0   0   1   37  204 240 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0050   100 37  0   0   1   37  120 156 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0035   100 37  0   0   1   37  204 240 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0054   100 37  0   0   1   37  549 585 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0028   100 37  0   0   1   37  549 585 1.02E-13    73.8
rno_circRNA_014621  rno-Ralgapa1_0006   100 37  0   0   1   37  549 585 1.02E-13    73.8

I used the below command for extracting the best hit from the above results:

sort -k1,1 -k12,12gr -k11,11g -k3,3gr blastout.txt | sort -u -k1,1 --merge > bestHits

that returned rno-Ralgapa1_0006 as the best hit for the rno_circRNA_014621. My question is why other hits, something like rno-Ralgapa1_0048 or rno-Ralgapa1_0035 is not the best hit? how I can ensure the blast output, actually the obtained circAtlas IDs, are correct?

Sharing any suggestion for obtaining the right CircAtlas Id would be highly appreciated.

Thanks

blastn microarray RNA circular-ID • 301 views
ADD COMMENT

Login before adding your answer.

Traffic: 2564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6