Identify point mutations from each read in sam files
1
0
Entering edit mode
5.6 years ago
DVA ▴ 550

Hello,

I wonder if there is an easy way to look at point mutations from SAM files. I understand there are mutation calling software such as GATK, but I need to look at each individual read, not the sequence region as a whole.

The CIGAR column in SAM does not have any information about point mutations, and the TAG field does not seem to help either. My current method is just to align each read back to the reference using the coordinates given, but I would appreciate any other better method.

Thanks everyone!

sam mutations • 4.1k views
1
Entering edit mode

You can use the MD field in the BAM file that stores information about mismatching positions. If your bam files doesn't have MD tag, you can generate it using samtools calmd function. Check these relevant posts: Position Of Mismatches Per Read From A Sam/Bam FileHow To Find Out The Mismatchs Of An Alignment Entry In The Sam File?

0
Entering edit mode

Thank you! I didn't know samtools could do that. I really appreciate your help!

4
Entering edit mode
5.6 years ago

use my tool sam2tsv https://github.com/lindenb/jvarkit/wiki/SAM2Tsv

\$ java -jar dist/sam2tsv.jar -A  \
-r samtools-0.1.18/examples/toy.fa
samtools-0.1.18/examples/toy.sam
r001    163 ref 0   T   .   7   T   M
r001    163 ref 1   T   .   8   T   M
r001    163 ref 2   A   .   9   A   M
r001    163 ref 3   G   .   10  G   M
r001    163 ref 4   A   .   11  A   M
r001    163 ref 5   T   .   12  T   M
r001    163 ref 6   A   .   13  A   M
r001    163 ref 7   A   .   14  A   M
r001    163 ref 8   A   .   .   .   I
r001    163 ref 9   G   .   .   .   I
r001    163 ref 10  A   .   .   .   I
r001    163 ref 11  G   .   .   .   I
r001    163 ref 12  G   .   15  G   M
r001    163 ref 13  A   .   16  A   M
r001    163 ref 14  T   .   17  T   M
r001    163 ref 15  A   .   18  A   M
r001    163 ref .   .   .   19  G   D


(...)

0
Entering edit mode

Thanks so much! Just one more question: the output for single mismatches is marked by "M", just like exact matches. (e.g.: In your output example of sam2tsv website: r003 0 ref 2 C . 11 A M) - does your program offer a way to distinguish these from exact matches?

0
Entering edit mode

The M is the 'M' of the cigar string. You can use awk to test if the column (REF-base)==column(READ-base)  . Or you can use https://github.com/lindenb/jvarkit/wiki/SamFixCigar to changed the 'M' to 'X' or '='

0
Entering edit mode

Thank you. The java version used is 1.7.0_65, and other requirements on your websites are also fulfilled. However, make sam2tsv gave me the following errors:

### COMPILING sam2tsv ######
mkdir -p /dir/jvarkit/jvarkit/_tmp-1.133/META-INF /dir/jvarkit/jvarkit/dist-1.133 galaxy-bundle/jvarkit
#create galaxy
rm -f galaxy-bundle/sam2tsv.xml
xsltproc --path /dir/jvarkit/jvarkit/src/main/resources/xml --output galaxy-bundle/jvarkit/sam2tsv.xml --stringparam name "sam2tsv" --stringparam class "com.github.lindenb.jvarkit.tools.sam2tsv.Sam2Tsv" --stringparam classpath "commons-jexl-2.1.1.jar commons-logging-1.1.1.jar htsjdk-1.133.jar snappy-java-1.0.3-rc3.jar sam2tsv.jar" --stringparam version  cat  /dir/jvarkit/jvarkit/.git/refs/heads/master  /dir/jvarkit/jvarkit/src/main/resources/xsl/tools2galaxy.xsl /dir/jvarkit/jvarkit/src/main/resources/xml/tools.xml || echo "XSLT failed (ignored)"
XSLT failed (ignored)
cp galaxy-bundle/jvarkit/sam2tsv.xml /dir/jvarkit/jvarkit/_tmp-1.133/META-INF/galaxy.xml
cp: cannot stat galaxy-bundle/jvarkit/sam2tsv.xml': No such file or directory
make: [sam2tsv] Error 1 (ignored)
#generate java code if needed = a file with .xml exists, requires xsltproc
if [ -f "/dir/jvarkit/jvarkit/src/main/java/com/github/lindenb/jvarkit/tools/sam2tsv/Sam2Tsv.xml" ] ; then mkdir -p /dir/jvarkit/jvarkit/src/main/generated-sources/java/com/github/lindenb/jvarkit/tools/sam2tsv/ && xsltproc --xinclude --stringparam githash  cat  /dir/jvarkit/jvarkit/.git/refs/heads/master  --path "/dir/jvarkit/jvarkit/src/main/resources/xml" -o /dir/jvarkit/jvarkit/src/main/generated-sources/java/com/github/lindenb/jvarkit/tools/sam2tsv/AbstractSam2Tsv.java /dir/jvarkit/jvarkit/src/main/resources/xsl/command2java.xsl "/dir/jvarkit/jvarkit/src/main/java/com/github/lindenb/jvarkit/tools/sam2tsv/Sam2Tsv.xml"  ; fi
/dir/jvarkit/jvarkit/src/main/resources/xsl/command2java.xsl:304: parser error : Opening and ending tag mismatch: template line 279 and if
</xsl:if>
^
/dir/jvarkit/jvarkit/src/main/resources/xsl/command2java.xsl:325: parser error : Opening and ending tag mismatch: stylesheet line 2 and template
</xsl:template>
^
/dir/jvarkit/jvarkit/src/main/resources/xsl/command2java.xsl:328: parser error : Extra content at the end of the document
<xsl:template match="c:option[@type='outFile' and not(@multiple='true')]" mode="
^
cannot parse /dir/jvarkit/jvarkit/src/main/resources/xsl/command2java.xsl
make: *** [sam2tsv] Error 4


I also tried re-download and install, but that doesn't help. Could you please let me know what I missed here? Thank you.

0
Entering edit mode

Hi Pierre, Thanks again for your help - but I'm having issues with "make sam2tsv" and wonder if you could please give me some hints (see the previous comment). I'm sorry to bother you and I hope it's not a very silly question to waste your time... Thank you very much in advance.

0
Entering edit mode

opps ! thank you this is a file I recently modified. It's Fixed now : https://github.com/lindenb/jvarkit/commit/b1ddf8ca859f85caec9c29b4ee62f2364b56902f  . Sorry, download or pull again .

0
Entering edit mode

Thanks a lot for the clarification. I'm going to try it out tomorrow:)

##Update:

It works~ Yay! Thanks a lot.

0
Entering edit mode

Hi Pierre,

many thanks for this tool! It's very useful also for me.

I have just one question. Is there a way how to get records only for selected locations/snps, not for the whole read? I have quite long list of snps for which I would like to get a record in tsv file. As the tsv file contains all positions of every read covering these snps, it gets really big (hundreds of G). I can post-process the file to select only positions that I am interested in, but it would be much easier if I could limit the positions at the beginning.

As an alternative, I am now splitting my original sam/bam file into a number of small files and I will try to process it in pieces.

I am sorry if my question is naive, I don't have any real bioinformatic background.

Thank you!

0
Entering edit mode
samtools -b view your.bam your.bed | java -jar sam2tsv.jar
`
0
Entering edit mode

Hi Pierre,

thanks for reply. I tried it already but evidently the amount of snps I want to process is way too big. This approach reduces the original bam file to approximately 78% of it's original size.

But I can proceed with genome split in parts.

Thanks