Error of bam-to-sam conversion using samtools
0
0
Entering edit mode
5.9 years ago
Gary ▴ 480

Hi,

We run TopHat2 for 6 RNA-Seq alignment on a local Galaxy (Ubuntu system), and obtain bam files. However, when I use samtools to convert bam to sam on my MacBook Pro notebook (OS X Yosemite 10.10.5), and it shows error messages: can't allocate region (the detail below). Could you help me how to deal with this issue? Thank you so much.

Best,

Gary

gary > ll

total 12010872

-rwxrwxrwx@ 1 gary  staff   1.6K Mar 10 10:55 BamToSamToHTSeq.txt

-rw-r--r--@ 1 gary  staff   1.0G Mar 10 10:39 Chuong569_E9_thigh_F_m3.bam

-rw-r--r--@ 1 gary  staff   1.1G Mar 10 10:43 Chuong570_E9_thigh_F_m4.bam

-rw-r--r--@ 1 gary  staff   926M Mar 10 10:39 Chuong571_E9_thigh_F_m5.bam

-rw-r--r--@ 1 gary  staff   947M Mar 10 10:40 Chuong572_E12_S_m3.bam

-rw-r--r--@ 1 gary  staff   982M Mar 10 10:43 Chuong573_E12_S_m4.bam

-rw-r--r--@ 1 gary  staff   882M Mar 10 10:39 Chuong574_E12_S_m5.bam

-rw-r--r--@ 1 gary  staff    47M Mar  3 10:40 galGal4UCSCensGene81_TableBrowser.gtf

gary > samtools view -h -o Chuong572_E12_S_Grem1_m3.sam Chuong572_E12_S_m3.bam

samtools(1075,0x7fff7690b300) malloc: *** mach_vm_map(size=18446744073226854400) failed (error code=3)

*** error: s

*** set a breakpoint in malloc_error_break to debug

Segmentation fault: 11

gary > samtools view -h -o Chuong573_E12_S_Grem1_m4.sam Chuong573_E12_S_m4.bam

samtools(1077,0x7fff7690b300) malloc: *** mach_vm_map(size=18446744072434704384) failed (error code=3)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

Segmentation fault: 11

gary > samtools view -h -o Chuong574_E12_S_Grem1_m5.sam Chuong574_E12_S_m5.bam

samtools(1079,0x7fff7690b300) malloc: *** mach_vm_map(size=18446744071942492160) failed (error code=3)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

Segmentation fault: 11

gary > samtools view -h -o Chuong569_E9_thigh_F_m3.sam Chuong569_E9_thigh_F_m3.bam

Segmentation fault: 11

gary > samtools view -h -o Chuong570_E9_thigh_F_m4.sam Chuong570_E9_thigh_F_m4.bam

samtools(1084,0x7fff7690b300) malloc: *** mach_vm_map(size=18446744072328564736) failed (error code=3)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

Segmentation fault: 11

gary > samtools view -h -o Chuong571_E9_thigh_F_m5.sam Chuong571_E9_thigh_F_m5.bam

samtools(1086,0x7fff7690b300) malloc: *** mach_vm_map(size=1580085248) failed (error code=3)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

samtools TopHat2 RNA-Seq BAM SAM • 3.2k views
2
Entering edit mode

I am just curious: you have a script or document BamToSamToHTSeq.txt on top of your list. Do you try to convert bam to sam in order to read sam with HTSeq? Why? HTSeq for python can read bam files natively:

your_bam = HTSeq.BAM_Reader( "your.bam" )


and it can write to BAM with BAM_Writer() as well, so if you have to convert bam to sam you can do it with HTSeq.

Regarding the error - you do not have enough RAM (which is strange, unless as genomax2 pointed out you compiled your self and now there is a memory leak for yet unknown reason).

I am not sure if this will help you, but my first idea is to try using your system's installation tool like (instead of make):

apt-get install samtools


otherwise, my second idea is to try other options with samtools like this:

samtools view -H your.bam > output.sam
samtools view your.bam >> output.sam


or this:

samtools view -u your.bam > uncompressed.bam
samtools view uncompressed.bam > output.sam


my last idea is to try another version of samtools (older one?) and, finally, test if bamtools is any better for that particular task: https://github.com/pezmaster31/bamtools/wiki

bamtools convert -format sam -in your.bam -out output.sam

1
Entering edit mode

This is macOS so no apt-get.

1
Entering edit mode

then use (install) ruby and homebrew and use it for getting the latest samtools as genomax2 suggested.

Homebrew will be very useful for some other bioinformatics and geospatial tools later on.

By the way, ruby is another good thing to learn as it is very good for string processing and can do things beyond awk. It is easier and prettier from my perspective for text processing then python. Then you Rails and things get more fun for web =) And you never know what languages will be used in the organization you are going to work in the future or in programs you will need to change, maintain or find bugs at.

0
Entering edit mode

oh, I thought that was the topic starter, not you =) We all were writing at the same time, so I did not know he uses MacOS when I posted my comment

0
Entering edit mode

Thank you so much. I have tried your suggestions, but they don’t work on my MacBook Pro (the detail below). By the way, I don’t have cmake, and have not installed bamtools successfully. I only use htseq-count for RNA-Seq quantification, and you can see my BamToSamToHTSeq.txt file below. I convert bam to sam before quantification, because my htseq-count cannot find pysam, although I have installed pysam several times. The below shows directories I can find pysam on my notebook now.

Because I can run samtools to convert other RNA-Seq bam files aligned by STAR on Partek Flow (the detail of one example below), I wonder whether there are some errors of these bam files aligned by TopHat2 on a local Galaxy. I don’t know whether you can help to check one of my trouble bam files (http://68.181.92.180/~Gary/temporal/ErrorOfBamToSam.bam)? Many thanks.

# The samtools commands I tried

gary > samtools view -H Chuong569_E9_thigh_F_m3.bam > Chuong569_E9_thigh_F_m3.sam

Segmentation fault: 11

gary > samtools view Chuong569_E9_thigh_F_m3.bam >> Chuong569_E9_thigh_F_m3.sam

Segmentation fault: 11

gary > samtools view -u Chuong569_E9_thigh_F_m3.bam > uncompressed_Chuong569.bam

Segmentation fault: 11

gary > apt-get install samtools

gary > htseq-count -f bam -s no -a 10 -t exon -i gene_id -m union Chuong569_E9_thigh_F_m3.bam galGal4UCSCensGene81_TableBrowser.gtf > Chuong569_E9_thigh_F_m3.txt

100000 GFF lines processed.

200000 GFF lines processed.

300000 GFF lines processed.

386159 GFF lines processed.

Please Install PySam to use the BAM_Reader Class (http://code.google.com/p/pysam/)Error occured when reading beginning of SAM/BAM file.

No module named pysam

[Exception type: ImportError, raised in __init__.py:937]

# My BamToSamToHTSeq.txt file

gary > cat BamToSamToHTSeq.txt

#!/bin/bash

#Convert Bam to Sam to HTSeq

samtools view -h -o Chuong569_E9_thigh_F_m3.sam Chuong569_E9_thigh_F_m3.bam

...

htseq-count -f sam -s no -a 10 -t exon -i gene_id -m union Chuong569_E9_thigh_F_m3.sam galGal4UCSCensGene81_TableBrowser.gtf > Chuong569_E9_thigh_F_m3.txt

...

# Directories I can find pysam on my MacBook Pro

gary > cd /Users/gary/anaconda/pkgs/pysam-0.8.4-py27_0/lib/python2.7/site-packages/pysam/

gary > ls

Pileup.py       cbcf.pxd        csamtools.so        cvcf.so         samfile_util.h

Pileup.pyc      cbcf.so         ctabix.pxd      htslib_util.h       tabix_util.h

__init__.py     cfaidx.pxd      ctabix.so       include/        version.py

__init__.pyc        cfaidx.so       ctabixproxies.pxd   libchtslib.so       version.pyc

calignedsegment.pxd chtslib.pxd     ctabixproxies.so    namedtuple.py

calignedsegment.so  csamfile.pxd        cutils.pxd      namedtuple.pyc

calignmentfile.pxd  csamfile.so     cutils.so       pysam_stream.h

calignmentfile.so   csamtools.pxd       cvcf.pxd        pysam_util.h

gary > cd /Users/gary/anaconda/pkgs/pysam-0.8.4-py27_0/lib/python2.7/site-packages/pysam-0.8.4-py2.7-macosx-10.5-x86_64.egg-info/

gary > ls

SOURCES.txt     native_libs.txt     top_level.txt

gary > cd /Users/gary/anaconda/pkgs/pysam-0.8.4-py27_0/

gary > ls

info/   lib/

gary > cd /Users/gary/anaconda/lib/python2.7/site-packages/pysam-0.8.4-py2.7-macosx-10.5-x86_64.egg-info/

gary > ls

SOURCES.txt     native_libs.txt     top_level.txt

gary > cd /Users/gary/anaconda/lib/python2.7/site-packages/pysam-0.8.4.dist-info/

gary > ls

# samtools can convert bam to sam for other RNA-Seq samples

gary > samtools view -h -o White_Ch28.sam White_Ch28.bam

gary > ll

total 18337176

-rw-r--r--@ 1 gary  staff   1.0G Mar 10 10:39 ErrorOfBamToSam.bam

drwxr-xr-x  9 gary  staff   306B Feb 18 22:44 HiChIP/

-rw-r--r--@ 1 gary  staff   1.2G Mar 11 09:58 White_Ch28.bam

-rw-r--r--  1 gary  staff   6.5G Mar 11 10:08 White_Ch28.sam

gary > samtools view -h -o ErrorOfBamToSam.sam ErrorOfBamToSam.bam

Segmentation fault: 11

1
Entering edit mode

I tried your file http://68.181.92.180/~Gary/temporal/ErrorOfBamToSam.bam

I also get segmentation fault on that file. Where did you get it from? The header is broken in it. Looks like somebody concatenated text file with sam header with bam file.

0
Entering edit mode

Thanks a lot. I cannot know this problem without your help. In my previous lab, they build a local Galaxy, and use TopHat2 to run RNA-Seq alignments. These bam files were downloaded from their local Galaxy, and uploaded to a Network-attached storage (NAS) by them. Then I downloaded these bam files from the NAS. What other information I can provide for figuring out this issue?

Do you think that I still need to install a new version of samtools? After googling, I know how to use homebrew to install samtools (http://www.danielecook.com/installing-tabix-and-samtools-on-mac/), but I still don’t know how to uninstall my old version of samtools properly. I am not really familiar with samtools, and could you help me? Many thanks.

1
Entering edit mode

No, other version of samtools will not help. You need to find the proper bam or sam file along that chain. The best is to investigate where the problem arose together with your collaborators and make sure this had not affected others, not going to be repeated and that data was not lost for you and others.

1
Entering edit mode

If you have so many different parties in the chain, checking for data consistency on every step is a good idea. This can be stored in a log file that is stored in the header in @CO lines https://samtools.github.io/hts-specs/SAMv1.pdf

0
Entering edit mode

And you tried to install PySam on your Mac with no luck? (I have no idea if this is possible, but remember that python on Mac and updated to some dependencies is very hard to work around on Mac, but there is a way to have a separate python not relate to system's python).

0
Entering edit mode

You are right. One of my colleagues who can write codes for Matlab has helped me to install PySam on my Mac several times, but it didn’t work. It is the reason why I use sam, but not bam files to run RNA-Seq quantification by htseq-count. When I have time, I would like to deal with this issue. Do you have any suggestions, or related websites I can learn? Thanks a lot.

0
Entering edit mode

Use featureCounts in future for counting. That way you can use BAM files directly.

0
Entering edit mode

I will. Thanks a lot.

1
Entering edit mode

What version of samtools are you using? How much memory do you have on this MacBook (standard 8 GB)? Did you compile samtools yourself?

0
Entering edit mode

I use samtools 1.2 (the detail below). There are 16 GB 1600 MHz DDR3 memory on my MacBook Pro. I don't compile samtools by myself. In fact, I don't know how to compile samtools. Many thanks.

gary > samtools --version

samtools 1.2

Using htslib 1.2.1

Copyright (C) 2015 Genome Research Ltd.

2
Entering edit mode

You could use homebrew to install the latest samtools which is currently in v.1.3.1

0
Entering edit mode

Thanks a lot. Would you please show me the command line how to use homebrew to upgrade samtools?

1
Entering edit mode

where did you get your samtools from and how you installed it?

0
Entering edit mode

Sorry, I don't remember where I got my samtools, and how I installed it.

0
Entering edit mode

I have to correct some information. If "make" means I compile samtools by myself, I could compile samtools by myself. However, I don't remember where I downloaded samtools and how I installed it. So, it is hard to say whether I compile it or not.

0
Entering edit mode

1
Entering edit mode

Did you mix up -i and -o in your samtools command by any chance?

0
Entering edit mode

Because I can run samtools to convert bam to sam for other RNA-Seq samples, I think it is not an issue in my case. Please see an example detail below.

gary > pwd

gary > ll

total 4645256

-rw-r--r--@ 1 gary  staff   1.0G Mar 10 10:39 ErrorOfBamToSam.bam

drwxr-xr-x  9 gary  staff   306B Feb 18 22:44 HiChIP/

-rw-r--r--@ 1 gary  staff   1.2G Mar 11 09:58 White_Ch28.bam

gary > samtools view -h -o White_Ch28.sam White_Ch28.bam

gary > ll

total 18337176

-rw-r--r--@ 1 gary  staff   1.0G Mar 10 10:39 ErrorOfBamToSam.bam

drwxr-xr-x  9 gary  staff   306B Feb 18 22:44 HiChIP/

-rw-r--r--@ 1 gary  staff   1.2G Mar 11 09:58 White_Ch28.bam

-rw-r--r--  1 gary  staff   6.5G Mar 11 10:08 White_Ch28.sam

gary > samtools view -h -o ErrorOfBamToSam.sam ErrorOfBamToSam.bam

Segmentation fault: 11

1
Entering edit mode

Perhaps you can try running Samtool's regression tests to see if your installation works as expected ? Here is the link to the test directory at version 1.2 on GitHub.