How to merge two identical BAM files?
1
0
Entering edit mode
7.7 years ago
Joe ▴ 30

Hi I'm new to the NGS data analysis.

To practice the command line skills, I want to test the EstimateLibraryComplexity after double the bam. I have one bam file (1.bam), copy paste it (name it 1copy.bam), then merge 1.bam and 1copy.bam by using picard-tools MergeSamFiles. I use samtools flagstat check all the numbers are doubled.

The question is 1, why the merged bam size is not doubled? 1.bam is 79Mb, but merged bam is only 84Mb.

2, I run the EstimateLibraryComplexity for 1.bam and merged bam. the READ_PAIRS_EXAMINED for 1.bam is 651770, but for merged bam is only 3548.I think there mush be something wrong with my setting.

BTW, the PERCENT_DUPLICATION for merged bam is 0.5, which is I expected.And I also use samtools view to view the merged bam, every reads has identical duplicates.

Thanks,

merge picard tools • 3.0k views
ADD COMMENT
0
Entering edit mode

Thanks a lot, try to change the 1copy.bam reads name, then redo the whole process again.

ADD REPLY
2
Entering edit mode
7.7 years ago

why the merged bam size is not doubled? 1.bam is 79Mb, but merged bam is only 84Mb.

compression algorithm: similar and close information is better compressed. https://en.wikipedia.org/wiki/Gzip

$ echo -e "AACTGCTGCTAGCTAGCTAGATGCTGCTGCATGCTGGACCTGATCGATGCATCTAGCA\n167517651758715876138765387158712518" | gzip  | wc -c
78
$ echo -e "AACTGCTGCTAGCTAGCTAGATGCTGCTGCATGCTGGACCTGATCGATGCATCTAGCA\nAACTGCTGCTAGCTAGCTAGATGCTGCTGCATGCTGGACCTGATCGATGCATCTAGCA" | gzip  | wc -c
54

2, I run the EstimateLibraryComplexity for 1.bam and merged bam. the READ_PAIRS_EXAMINED for 1.bam is 651770, but for merged bam is only 3548.I think there mush be something wrong with my setting.

did you change the reads name in the 2nd bam ?. Inserting some reads with the very same name and sequence might give some strange results...

ADD COMMENT
0
Entering edit mode

did you change the reads name in the 2nd bam

Does not look like it from OP. The original file was copied and pasted to create a duplicate.

ADD REPLY
0
Entering edit mode

I didn't change the reads name.

ADD REPLY
0
Entering edit mode

From genomax2

"I see. You are creating a fake duplicate with same sequence data but different headers. You could just change ANGUV to BNGUV which would make it a new flowcell :-)"

ADD REPLY

Login before adding your answer.

Traffic: 3829 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6