Question: Identify point mutations from each read in sam files
0
gravatar for DVA
3.9 years ago by
DVA530
United States
DVA530 wrote:

Hello,

I wonder if there is an easy way to look at point mutations from SAM files. I understand there are mutation calling software such as GATK, but I need to look at each individual read, not the sequence region as a whole. 

The CIGAR column in SAM does not have any information about point mutations, and the TAG field does not seem to help either. My current method is just to align each read back to the reference using the coordinates given, but I would appreciate any other better method.

Thanks everyone!

mutations sam • 3.1k views
ADD COMMENTlink modified 3.9 years ago by Pierre Lindenbaum122k • written 3.9 years ago by DVA530
1

You can use the MD field in the BAM file that stores information about mismatching positions. If your bam files doesn't have MD tag, you can generate it using samtools calmd function. Check these relevant posts: Position Of Mismatches Per Read From A Sam/Bam FileHow To Find Out The Mismatchs Of An Alignment Entry In The Sam File?

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Ashutosh Pandey11k

Thank you! I didn't know samtools could do that. I really appreciate your help!

ADD REPLYlink written 3.9 years ago by DVA530
4
gravatar for Pierre Lindenbaum
3.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

use my tool sam2tsv https://github.com/lindenb/jvarkit/wiki/SAM2Tsv

 

$ java -jar dist/sam2tsv.jar -A  \
    -r samtools-0.1.18/examples/toy.fa 
      samtools-0.1.18/examples/toy.sam
r001    163 ref 0   T   .   7   T   M
r001    163 ref 1   T   .   8   T   M
r001    163 ref 2   A   .   9   A   M
r001    163 ref 3   G   .   10  G   M
r001    163 ref 4   A   .   11  A   M
r001    163 ref 5   T   .   12  T   M
r001    163 ref 6   A   .   13  A   M
r001    163 ref 7   A   .   14  A   M
r001    163 ref 8   A   .   .   .   I
r001    163 ref 9   G   .   .   .   I
r001    163 ref 10  A   .   .   .   I
r001    163 ref 11  G   .   .   .   I
r001    163 ref 12  G   .   15  G   M
r001    163 ref 13  A   .   16  A   M
r001    163 ref 14  T   .   17  T   M
r001    163 ref 15  A   .   18  A   M
r001    163 ref .   .   .   19  G   D

(...)

ADD COMMENTlink written 3.9 years ago by Pierre Lindenbaum122k

Thanks so much! Just one more question: the output for single mismatches is marked by "M", just like exact matches. (e.g.: In your output example of sam2tsv website: r003 0 ref 2 C . 11 A M) - does your program offer a way to distinguish these from exact matches? 

ADD REPLYlink written 3.9 years ago by DVA530

The M is the 'M' of the cigar string. You can use awk to test if the column (REF-base)==column(READ-base)  . Or you can use https://github.com/lindenb/jvarkit/wiki/SamFixCigar to changed the 'M' to 'X' or '='

ADD REPLYlink written 3.9 years ago by Pierre Lindenbaum122k

Thank you. The java version used is 1.7.0_65, and other requirements on your websites are also fulfilled. However,  "make sam2tsv" gave me the following errors:

### COMPILING sam2tsv ######
mkdir -p /dir/jvarkit/jvarkit/_tmp-1.133/META-INF /dir/jvarkit/jvarkit/dist-1.133 galaxy-bundle/jvarkit
#create galaxy
rm -f galaxy-bundle/sam2tsv.xml
xsltproc --path /dir/jvarkit/jvarkit/src/main/resources/xml --output galaxy-bundle/jvarkit/sam2tsv.xml --stringparam name "sam2tsv" --stringparam class "com.github.lindenb.jvarkit.tools.sam2tsv.Sam2Tsv" --stringparam classpath "commons-jexl-2.1.1.jar commons-logging-1.1.1.jar htsjdk-1.133.jar snappy-java-1.0.3-rc3.jar sam2tsv.jar" --stringparam version  `cat  /dir/jvarkit/jvarkit/.git/refs/heads/master ` /dir/jvarkit/jvarkit/src/main/resources/xsl/tools2galaxy.xsl /dir/jvarkit/jvarkit/src/main/resources/xml/tools.xml || echo "XSLT failed (ignored)"
XSLT failed (ignored)
cp galaxy-bundle/jvarkit/sam2tsv.xml /dir/jvarkit/jvarkit/_tmp-1.133/META-INF/galaxy.xml
cp: cannot stat `galaxy-bundle/jvarkit/sam2tsv.xml': No such file or directory
make: [sam2tsv] Error 1 (ignored)
#generate java code if needed = a file with .xml exists, requires xsltproc
if [ -f "/dir/jvarkit/jvarkit/src/main/java/com/github/lindenb/jvarkit/tools/sam2tsv/Sam2Tsv.xml" ] ; then mkdir -p /dir/jvarkit/jvarkit/src/main/generated-sources/java/com/github/lindenb/jvarkit/tools/sam2tsv/ && xsltproc --xinclude --stringparam githash  `cat  /dir/jvarkit/jvarkit/.git/refs/heads/master ` --path "/dir/jvarkit/jvarkit/src/main/resources/xml" -o /dir/jvarkit/jvarkit/src/main/generated-sources/java/com/github/lindenb/jvarkit/tools/sam2tsv/AbstractSam2Tsv.java /dir/jvarkit/jvarkit/src/main/resources/xsl/command2java.xsl "/dir/jvarkit/jvarkit/src/main/java/com/github/lindenb/jvarkit/tools/sam2tsv/Sam2Tsv.xml"  ; fi
/dir/jvarkit/jvarkit/src/main/resources/xsl/command2java.xsl:304: parser error : Opening and ending tag mismatch: template line 279 and if
                </xsl:if>
                         ^
/dir/jvarkit/jvarkit/src/main/resources/xsl/command2java.xsl:325: parser error : Opening and ending tag mismatch: stylesheet line 2 and template
</xsl:template>
               ^
/dir/jvarkit/jvarkit/src/main/resources/xsl/command2java.xsl:328: parser error : Extra content at the end of the document
<xsl:template match="c:option[@type='outFile' and not(@multiple='true')]" mode="
^
cannot parse /dir/jvarkit/jvarkit/src/main/resources/xsl/command2java.xsl
make: *** [sam2tsv] Error 4

####

I also tried re-download and install, but that doesn't help. Could you please let me know what I missed here? Thank you.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by DVA530

Hi Pierre, Thanks again for your help - but I'm having issues with "make sam2tsv" and wonder if you could please give me some hints (see the previous comment). I'm sorry to bother you and I hope it's not a very silly question to waste your time... Thank you very much in advance.

ADD REPLYlink written 3.9 years ago by DVA530

opps ! thank you this is a file I recently modified. It's Fixed now : https://github.com/lindenb/jvarkit/commit/b1ddf8ca859f85caec9c29b4ee62f2364b56902f  . Sorry, download or pull again .

ADD REPLYlink written 3.9 years ago by Pierre Lindenbaum122k

Thanks a lot for the clarification. I'm going to try it out tomorrow:)

##Update:

It works~ Yay! Thanks a lot.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by DVA530

Hi Pierre,

many thanks for this tool! It's very useful also for me.

I have just one question. Is there a way how to get records only for selected locations/snps, not for the whole read? I have quite long list of snps for which I would like to get a record in tsv file. As the tsv file contains all positions of every read covering these snps, it gets really big (hundreds of G). I can post-process the file to select only positions that I am interested in, but it would be much easier if I could limit the positions at the beginning.

As an alternative, I am now splitting my original sam/bam file into a number of small files and I will try to process it in pieces.

I am sorry if my question is naive, I don't have any real bioinformatic background.

Thank you!

ADD REPLYlink written 2.8 years ago by 5ab0

samtools -b view your.bam your.bed | java -jar sam2tsv.jar

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Pierre Lindenbaum122k

Hi Pierre,

thanks for reply. I tried it already but evidently the amount of snps I want to process is way too big. This approach reduces the original bam file to approximately 78% of it's original size.

But I can proceed with genome split in parts.

Thanks

ADD REPLYlink written 2.8 years ago by 5ab0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1281 users visited in the last hour