One possible piped solution to report inexact matches in a genome to fasta formatted queries =
BBmap (from BBtools) | sam2bed (from bedops)
Steps and syntax that might be applicable for my case:
Step 1. Build searchable index for reference genome to be scanned for matches
Step 2. Scan reference genome index with
bbmap.sh minid=0.8 idfilter=0.8 secondary=T maxsites=500 in=Query.fasta out=Query_Vs_Ref_BBmap_minid_idfilter_80per_multi.sam -noheader
Explanation of the flags
minid = is approximate, but can help accelerate BBmap (paraphrasing Brian Bushnell, BBmap's author)
idfilter = "to filter out alignments with under exactly x% identity" (quoting Brian Bushnell)
secondary = enables reporting match details for not just the first or best match
maxsites = provides a ceiling for the number of matching genomic regions for which details will be saved to the output file
noheader = I didn't want bulkier output files than required, therefore suppressing SAM file format header info
Step 3. Convert output SAM file to BED format
sam2bed < BBmap_out.sam > Coords_file.bed
Step 4. Parse BED format for match coords report
Use awk, Perl or Python script, or BBmap reporting option in step 2 itself
AFAIK, NGS mappers like BBmap were not designed with such a goal in mind, so I've still got to QC my suggested little pipeline with control cases. So this is a placeholder, and I will edit details if and when necessary.
Software versions : BBTools - BBMap version 38.35, bedops - version: 2.4.35 (typical)
Downloaded from : https://sourceforge.net/projects/bbmap/ and https://bedops.readthedocs.io/en/latest/index.html
PS. Thanks to Brian Bushnell for advising use of
idfilter flag during the BBmap step.
Perhaps Brian can render the final verdict on validity of this syntax?! Though I don't see any posts from him in > 1 year now...
note: This is the solution for the problem I described in the now closed thread at BLAST run parameters and parsing advice