Question

Downloading 1000G file using samtools -b -L results in bam files of different sizes for the same input command

0

Entering edit mode

8.7 years ago

raunaq.123 • 0

Hi

We are working on the 1000 genomes data and trying to download a subsection of the data using genomic coordinates listed in a bed file.

The command we are using is

samtools view -b -L ../XYZ_coordinates.all.bed ftp://ftptrace.ncbi.nih.gov/1000genomes/ftp/phase3/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.ow_coverage.20120522.bam > ../1000data/HG00096.bam

The bed file contains a list of 90 different coordinates in it. We submitted a batch job for all the 1000 genomes and were able to download the bam files corresponding to the coordinates in the bed file. However, when redownloading some genomes individually, we get a bam file of a larger size. For example:

Downloading for NA21090 in the batch job gave a file size of

428K Sep  1 08:46 NA21090.bam

while the same command ran again gave a filesize of

3.4M Sep  8 19:57 NA21090.bam

Could someone please explain why the same command is giving different file sizes in the batch submission mode versus individual submissions? We used pbs scripting to submit the batch jobs, where batch file downloaded data from 50 ftp locations in one after another.

TIA

next-gen sequence 1000Genomes • 2.0k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.7 years ago by raunaq.123 • 0

1

Entering edit mode

I think its a network issue. Nothing to do with samtools per se.

ADD REPLY • link updated 20 months ago by Ram 43k • written 8.7 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks! It seems like a network issue than samtools problem. We were downloading multiple files simultaneously and all the bam files that had same time stamp of generation were usually the ones that showed this problem.

ADD REPLY • link updated 20 months ago by Ram 43k • written 8.7 years ago by raunaq.123 • 0