pysam error when reading .bam file ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format?
0
0
Entering edit mode
3.2 years ago
daewowo ▴ 80

Error:

f = pysam.AlignmentFile("SRA_sorted.bam","rb")
File "pysam/libcalignmentfile.pyx", line 991, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False

Process I followed to get to the error:

I downloaded a SRA dataset from NCBI and used SRAtools sam-dump to convert the SRA into a sam file.

sam-dump --output-file SRA.sam SRA

I checked the file:

samtools quickcheck SRA.sam
>SRA.sam had no targets in header.

I then checked with picard:

java -jar gatk-package-4.1.9.0-local.jar ValidateSamFile I=SRA.sam MODE=SUMMARY
Error Type  Count
ERROR:MISSING_READ_GROUP    1
ERROR:READ_GROUP_NOT_FOUND  23209332
WARNING:RECORD_MISSING_READ_GROUP   23209332

Looking at the sam file with head it looks OK

1   77  *   0   0   *   *   0   0   TACAGAA...

I used the following to convert to bam file:

samtools sort SRA.sam -o SRA_sorted.bam

I confirmed that the file is binary format

I then used the .bam file in a third party program which uses pysam. The pysam command which threw the error:

f = pysam.AlignmentFile("SRA_sorted.bam","rb")
File "pysam/libcalignmentfile.pyx", line 991, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False

I ran picard on the bam file which gave same errors as sam file shown above.

How can I work out exactly what the error is with pysam opening the file and fix this?

pysam bam sam • 4.8k views
ADD COMMENT
2
Entering edit mode

Looking at the sam file with head it looks OK

it's not. The header is missing.

ADD REPLY
1
Entering edit mode

Are you sure this is a valid SAM file that you dumped? These seem to be no headers and the single line you posted as an example looks like an unaligned read. The read ID is also 1 which is odd. If the original data submitted was fastq you should align the reads yourself to get SAM/BAM files.

ADD REPLY
0
Entering edit mode

Thanks I ran bwa to index to a reference genome and then aligned.

Now the .sam file has a header (and now I know it needs one) :-)

ADD REPLY
0
Entering edit mode

So things are working now?

ADD REPLY
0
Entering edit mode

Yes, thank you

ADD REPLY

Login before adding your answer.

Traffic: 2522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6