Any Issues With Samtools Validation Of Complete Genomics Bams?
2
0
Entering edit mode
11.0 years ago

Hello,

We've got some BAM files generated by Complete Genomics, and when we try to validate them with samtools, (a) they take an incredibly long time to validate (approx 100 hours), and (b) we get billions of validation errors indicating that a paired end read is missing its mate pair. I'm trying to figure out if something is wrong with these BAMs, or if for some reason samtools and Complete Genomics BAMs are just a bad combination. Anyone have experience with them?

Thanks!

samtools bam • 2.5k views
ADD COMMENT
1
Entering edit mode

What exactly you mean by 'validate' using samtools ? Normally, any aligner will output the unmapped pair of a mate-pair reads to the BAM file. Its reasonable to discard reads belonging to a pair if none of them is mapped. But in the case where one end is mapped and the other remains unmapped, the sequence information of the unmapped read could be used to detect indels using split read method. You would still be able to call for SNPs and Indels using methods other than split read method.

ADD REPLY
1
Entering edit mode
11.0 years ago
matted 7.8k

I wouldn't call running samtools fixmate on a bam as validation, since that's not its task.

Just a guess, but are you name-sorting the bam file before running fixmate? This is required, per the documentation, presumably so that reads are adjacent to their mates in the file.

I don't know what will happen if you run it without name-sorting the bam, but it may produce low paired estimates like you see.

A more natural way to actually validate a bam file is the Picard tool ValidateSamFile.

ADD COMMENT
0
Entering edit mode
11.0 years ago

Thanks for the question - good clarification! When we run samtools fixmate on a sample BAM, it reports that only a miniscule (2.48%) of the reads were properly paired. This seems very strange. Is there any reason why samtools fixmate might give misleading information for Complete Genomics BAMs?

Thanks!

ADD COMMENT

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6