Mate mismatch errors in 1000genomes CRAM files
0
0
Entering edit mode
5.3 years ago

I downloaded some CRAM files for variant calling from the 1000 genomes FTP server. I also downloaded the reference genome and MD5 cache as per the instructions in this README doc.

However, running Picard tools' ValidateSAMfile gave errors (refer below) in both the CRAM as well as the subsequently converted (to) BAM. Running FixMateInformation gave zero errors after revalidation.

Is anyone else encountering such issues with 1000genomes GRCh38 CRAM files ? What could be the source of these errors ?

Errors:

Mate negative strand flag does not match read negative strand of mate

Mate alignment does not match alignment start of mate

Mate CIGAR string does not match CIGAR string of mate

.

.

.

1000genomes Matepairs CRAM ValidateSAM FixMate • 1.1k views
ADD COMMENT
0
Entering edit mode

Can you show the exact commands you've used? e.g. FixMateInformation has a ADD_MATE_CIGAR=true options, was it used?

For what it's worth, even the Broad Institute doesn't seem totally confident that their tool works well with CRAM files, but also can you make sure you're using the latest version of Picard tools? Maybe it's better now. I know I've had bad luck using outputs of samtools (which now includes cramtools) into Picard. Annecdotally I found them pretty much incompatible in a specific project and never figured out why.

You could try looking at external tools for BAM/CRAM validation, e.g. https://genome.sph.umich.edu/wiki/BamUtil

ADD REPLY

Login before adding your answer.

Traffic: 1865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6