Hello everyone. I have a little question here. So I have NGS reads, appropriate reference genome and BED file with list of regions of interest. My final aim is to receive a list with information about mismatches. For example if we have data like this:
refgenseq: TCAATCAGGTGATGGAATTCATAGATTTCCATCCCGATAAATCCGGCACC
NGSread1: ___________GATGaAATTCtTAGATTTC
NGSread2: __________________TTCATAaATTTCCtTCCC
NGSread2: ____________________________TCCATCCCaATAAAT
I would like to receive data like this:
- number of mismatches: 5
- A-T subst: 1
- A-G subst: 0
- A-C subst: 0
- A-G subst: 0
- G-A subst: 2 etc.
I know that CIGAR parameter contain information about substitutions, but when I make sorting my SAM file with mapped reads with BED tool that contains coordinates of reads of interest by "Filter SAM or BAM, output SAM or BAM" option in the SAM-tool, all parameters, including CIGAR are lost. So how can I extract reads which comply with according regions of interest?
And I think SAMStat fits to me but may be is it possible to solve this problem with standard GALAXY sources? I am MAC OS not linux user, so it's a little bit difficult to use all this program out of GALAXY. But again - I have troubles with sorting reads.
Thank you for answering.