Question: .gstmp extension on .bam files downloaded from google bucket; and EOF marker is absent in bam file; is this a problem with downloading or my bam file is actually corrupted?
0
gravatar for ekwame001
6 weeks ago by
ekwame00110
USA/New York/Icahn School of Medicine at Mount Sinai
ekwame00110 wrote:

Dear all,

I have an already aligned .bam file stored in google bucket. I'm trying to retrieve and ummap them to ubam so i will be able to perform all analysis using GATK best practices. When i use gsutil cp gs:url/to/file/in/google/bucket there's an additional .gstmp extension at .bam files from google bucket.

Samtool view tells me that EOF marker is absent;

is this a problem with downloading my .bam files from google bucket or the .bam file that is actually actually corrupted? How will i be able to tell which is which? Any help will be much appreciated.

ADD COMMENTlink modified 6 weeks ago by Brice Sarver3.3k • written 6 weeks ago by ekwame00110

and ummap them to ubam so i will be able to perform all analysis using GATK best practices.

why do you need to unmap them ?

ADD REPLYlink written 6 weeks ago by Pierre Lindenbaum124k

@Pierre, data pre-processing for variant discovery according to gatk best practices requires either FASTQ or uBAM format. In my case I have only aligned .bam files (i didn't do this), hence the need to ummap reads. https://software.broadinstitute.org/gatk/documentation/article?id=6484

ADD REPLYlink written 4 weeks ago by ekwame00110

it doesn't makes sense to me, your reads are already mapped, you'll unmap them and remap them ?

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum124k

Samtool view tells me that EOF marker is absent;

try to reload the file. Is there any md5 available on the google bucket ?

ADD REPLYlink written 6 weeks ago by Pierre Lindenbaum124k
0
gravatar for Brice Sarver
6 weeks ago by
Brice Sarver3.3k
United States
Brice Sarver3.3k wrote:

The temporary extension likely means something went wrong, but Samtools' quickcheck will tell you if the BAM is valid. From the manual:

samtools quickcheck [options] in.sam|in.bam|in.cram [ ... ]

Quickly check that input files appear to be intact. Checks that beginning of the file contains a valid header (all formats) containing at least one target sequence and then seeks to the end of the file and checks that an end-of-file (EOF) is present and intact (BAM only).

Data in the middle of the file is not read since that would be much more time consuming, so please note that this command will not detect internal corruption, but is useful for testing that files are not truncated before performing more intensive tasks on them.

This command will exit with a non-zero exit code if any input files don't have a valid header or are missing an EOF block. Otherwise it will exit successfully (with a zero exit code).

Options:

-v Verbose output: will additionally print the names of all input files that don't pass the check to stdout. Multiple -v options will cause additional messages regarding check results to be printed to stderr.

-q Quiet mode: disables warning messages on stderr about files that fail. If both -q and -v options are used then the appropriate level of -v takes precedence.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Brice Sarver3.3k

Thanks @Brice Saver, I downloaded the .bam file again from my google bucket ($gsutil cp gs://google-bucket-path-to-bam-file.bam) but this time came without the .gstmp extension at the end of my downloaded file. samtools quickcheck in addition to ValidateSamFile (Picard) - was able to help me rule out any errors with my bam files.

ADD REPLYlink written 4 weeks ago by ekwame00110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1019 users visited in the last hour