.gstmp extension on .bam files downloaded from google bucket; and EOF marker is absent in bam file; is this a problem with downloading or my bam file is actually corrupted?
1
0
Entering edit mode
4.5 years ago
Alewa ▴ 170

Dear all,

I have an already aligned .bam file stored in google bucket. I'm trying to retrieve and ummap them to ubam so i will be able to perform all analysis using GATK best practices. When i use gsutil cp gs:url/to/file/in/google/bucket there's an additional .gstmp extension at .bam files from google bucket.

Samtool view tells me that EOF marker is absent;

is this a problem with downloading my .bam files from google bucket or the .bam file that is actually actually corrupted? How will i be able to tell which is which? Any help will be much appreciated.

next-gen alignment google-bucket • 4.6k views
ADD COMMENT
0
Entering edit mode

and ummap them to ubam so i will be able to perform all analysis using GATK best practices.

why do you need to unmap them ?

ADD REPLY
0
Entering edit mode

@Pierre, data pre-processing for variant discovery according to gatk best practices requires either FASTQ or uBAM format. In my case I have only aligned .bam files (i didn't do this), hence the need to ummap reads. https://software.broadinstitute.org/gatk/documentation/article?id=6484

ADD REPLY
0
Entering edit mode

it doesn't makes sense to me, your reads are already mapped, you'll unmap them and remap them ?

ADD REPLY
0
Entering edit mode

Samtool view tells me that EOF marker is absent;

try to reload the file. Is there any md5 available on the google bucket ?

ADD REPLY
1
Entering edit mode
4.5 years ago
Brice Sarver ★ 3.8k

The temporary extension likely means something went wrong, but Samtools' quickcheck will tell you if the BAM is valid. From the manual:

samtools quickcheck [options] in.sam|in.bam|in.cram [ ... ]

Quickly check that input files appear to be intact. Checks that beginning of the file contains a valid header (all formats) containing at least one target sequence and then seeks to the end of the file and checks that an end-of-file (EOF) is present and intact (BAM only).

Data in the middle of the file is not read since that would be much more time consuming, so please note that this command will not detect internal corruption, but is useful for testing that files are not truncated before performing more intensive tasks on them.

This command will exit with a non-zero exit code if any input files don't have a valid header or are missing an EOF block. Otherwise it will exit successfully (with a zero exit code).

Options:

-v Verbose output: will additionally print the names of all input files that don't pass the check to stdout. Multiple -v options will cause additional messages regarding check results to be printed to stderr.

-q Quiet mode: disables warning messages on stderr about files that fail. If both -q and -v options are used then the appropriate level of -v takes precedence.

ADD COMMENT
0
Entering edit mode

Thanks @Brice Saver, I downloaded the .bam file again from my google bucket ($gsutil cp gs://google-bucket-path-to-bam-file.bam) but this time came without the .gstmp extension at the end of my downloaded file. samtools quickcheck in addition to ValidateSamFile (Picard) - was able to help me rule out any errors with my bam files.

ADD REPLY

Login before adding your answer.

Traffic: 2673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6