Question: How to receive mismatch statistic?
0
gravatar for alexander.bykov94
3.0 years ago by
alexander.bykov940 wrote:

Hello everyone. I have a little question here. So I have NGS reads, appropriate reference genome and BED file with list of regions of interest. My final aim is to receive a list with information about mismatches. For example if we have data like this:

refgenseq: TCAATCAGGTGATGGAATTCATAGATTTCCATCCCGATAAATCCGGCACC

NGSread1: ___________GATGaAATTCtTAGATTTC

NGSread2: __________________TTCATAaATTTCCtTCCC

NGSread2: ____________________________TCCATCCCaATAAAT

I would like to receive data like this:

  • number of mismatches: 5
  • A-T subst: 1
  • A-G subst: 0
  • A-C subst: 0
  • A-G subst: 0
  • G-A subst: 2 etc.

I know that CIGAR parameter contain information about substitutions, but when I make sorting my SAM file with mapped reads with BED tool that contains coordinates of reads of interest by "Filter SAM or BAM, output SAM or BAM" option in the SAM-tool, all parameters, including CIGAR are lost. So how can I extract reads which comply with according regions of interest?

And I think SAMStat fits to me but may be is it possible to solve this problem with standard GALAXY sources? I am MAC OS not linux user, so it's a little bit difficult to use all this program out of GALAXY. But again - I have troubles with sorting reads.

Thank you for answering.

snp next-gen • 931 views
ADD COMMENTlink modified 3.0 years ago by Pierre Lindenbaum118k • written 3.0 years ago by alexander.bykov940
1
gravatar for Pierre Lindenbaum
3.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

using sam2tsv : https://github.com/lindenb/jvarkit/wiki/SAM2Tsv you can print the read bases and the ref bases in the genomic window.

samtools view your.bam "22:987-10981" | \
java -jar dist/sam2tsv.jar -r ref.fa | \
cut -f 5,8 | sort | uniq -c

(not tested)

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Pierre Lindenbaum118k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 885 users visited in the last hour