I'm trying to extract the FASTQs from a 1K genomes BAM using picard SamToFastq
but it raises an error;
net.sf.picard.PicardException: Found 7657 unpaired mates at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:185)
.PHONY= NA12878.fastqs NA12878_1.fastq.gz NA12878_2.fastq.gz : NA12878.fastqs NA12878.fastqs: NA12878.bam java -jar /path/to/picard-tools-1.87/SamToFastq.jar I=$< \ VALIDATION_STRINGENCY=SILENT \ FASTQ=$(basename $@)_1.fastq SECOND_END_FASTQ=$(basename $@)_2.fastq gzip --best $(basename $@)_1.fastq $(basename $@)_2.fastq NA12878.bam: curl -o $@ "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/phase2b_alignment/data/NA12878/exome_alignment/NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20120522_p2b.bam"
I also tried to use FixMateInformation, but it raised the same error. How can I fix this ?
The reason that the mates are missing and FixMateInformation does not work as you are expecting is that the chrom20 BAM files only contain reads that mapped to chromosome 20. If the mate mapped to another chromosome, then it would not be in this file. All mate-pairs should be in the "mapped" BAM files distributed by the project.