Downloading 1000G file using samtools -b -L results in bam files of different sizes for the same input command
0
0
Entering edit mode
8.7 years ago
raunaq.123 • 0

Hi

We are working on the 1000 genomes data and trying to download a subsection of the data using genomic coordinates listed in a bed file.

The command we are using is

samtools view -b -L ../XYZ_coordinates.all.bed ftp://ftptrace.ncbi.nih.gov/1000genomes/ftp/phase3/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.ow_coverage.20120522.bam > ../1000data/HG00096.bam

The bed file contains a list of 90 different coordinates in it. We submitted a batch job for all the 1000 genomes and were able to download the bam files corresponding to the coordinates in the bed file. However, when redownloading some genomes individually, we get a bam file of a larger size. For example:

Downloading for NA21090 in the batch job gave a file size of

428K Sep  1 08:46 NA21090.bam

while the same command ran again gave a filesize of

3.4M Sep  8 19:57 NA21090.bam

Could someone please explain why the same command is giving different file sizes in the batch submission mode versus individual submissions? We used pbs scripting to submit the batch jobs, where batch file downloaded data from 50 ftp locations in one after another.

TIA

next-gen sequence 1000Genomes • 2.0k views
ADD COMMENT
1
Entering edit mode

I think its a network issue. Nothing to do with samtools per se.

ADD REPLY
0
Entering edit mode

Thanks! It seems like a network issue than samtools problem. We were downloading multiple files simultaneously and all the bam files that had same time stamp of generation were usually the ones that showed this problem.

ADD REPLY

Login before adding your answer.

Traffic: 1294 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6