Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-seq, locating the candidate introns and adding up the score of all exons. This is very different from other versions of BLAST, where each exon is scored as a separate hit and read-pairing is ignored.
Magic-Blast incorporates within the NCBI BLAST code framework ideas developed in the NCBI Magic pipeline, in particular hit extensions by local walk and jump, which is faster than Smith-Waterman extension (http://www.ncbi.nlm.nih.gov/pubmed/26109056), and recursive clipping of mismatches near the edges of the reads, which avoids accumulating artefactual mismatches near splice sites and is needed to distinguish short indels from substitutions near the edges.
We call the whole next generation run (from Illumina, Roche-454, ABI, or another sequencing platform excluding SOLiD), a query. The input reads may be provided as SRA accession or file in a SRA, FASTA, FASTQ, or FASTC format. Read pairs can be presented as parallel files, or as successive reads in a single file.
The reference genome or transcriptome can be given as a BLAST database or a FASTA file. It is preferable to use BLAST database for large genomes, such as human, or transcript collections, such as all of RefSeq, Ensembl, or AceView. The procedure for creating a BLAST database is described below.