Picard Addorreplacereadgroups Results In Smaller File
1
2
Entering edit mode
11.8 years ago
DG 7.3k

Hi Everyone,

I have recently started doing mapping and variant calling on six whole-exome sequencing projects (6 different individuals). I have already mapped to the reference and converted the SAM files to BAM using Picard. I then added Read Group Data using AddOrReplaceReadGroups for each of the files and took the opportunity to also sort by coordinates. However, because I have added data I am a little puzzled that the resulting files are smaller in size as I started with BAM files to begin with. Each file is about 3-4 GB smaller in size. Is this normal or should I be worried? An example command line was:

java -Xmx2g -jar /usr/local/bin/AddOrReplaceReadGroups.jar INPUT=1804.bam OUTPUT=1804.sorted.bam SORT_ORDER=coordinate RGLB=8 RGPL=Illumina RGPU=1 RGSM=1804

Thanks everyone

bam exome-sequencing picard • 6.9k views
ADD COMMENT
8
Entering edit mode
11.8 years ago
brentp 24k

What is the original file size? Sorting should aid in compression because similar things are close together. You can alway check the number of reads by doing something like:

samtools view -F 4 -c 1804.sorted.bam
samtools view -F 4 -c 1804.bam

and you should get the same thing.

ADD COMMENT
0
Entering edit mode

1804.bam: 15 Gigs 1804.sorted.bam: 11G

And you were right, looks like the same number of reads. Apparently the sorted BAM files just compress further which isn't something I quite expected, but makes perfect sense once I think about it.

ADD REPLY
0
Entering edit mode

That's an excellent explanation. Nice thinking.

ADD REPLY

Login before adding your answer.

Traffic: 1501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6