Samtofastq: Net.Sf.Picard.Picardexception: "Found N Unpaired Mates "
4
0
Entering edit mode
10.0 years ago

I'm trying to extract the FASTQs from a 1K genomes BAM using picard SamToFastq

but it raises an error;

net.sf.picard.PicardException: Found 7657 unpaired mates
at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:185)


the Makefile:

   .PHONY= NA12878.fastqs
NA12878_1.fastq.gz NA12878_2.fastq.gz : NA12878.fastqs
NA12878.fastqs: NA12878.bam
java -jar  /path/to/picard-tools-1.87/SamToFastq.jar I=$< \ VALIDATION_STRINGENCY=SILENT \ FASTQ=$(basename $@)_1.fastq SECOND_END_FASTQ=$(basename $@)_2.fastq gzip --best$(basename $@)_1.fastq$(basename $@)_2.fastq NA12878.bam: curl -o$@  "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/phase2b_alignment/data/NA12878/exome_alignment/NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20120522_p2b.bam"


I also tried to use FixMateInformation, but it raised the same error. How can I fix this ?

Thanks.

picard bam fastq conversion error • 6.9k views
2
Entering edit mode

The reason that the mates are missing and FixMateInformation does not work as you are expecting is that the chrom20 BAM files only contain reads that mapped to chromosome 20. If the mate mapped to another chromosome, then it would not be in this file. All mate-pairs should be in the "mapped" BAM files distributed by the project.

7
Entering edit mode
10.0 years ago
lh3 33k

The best way to convert BAM to fastq is to use htslib:

htscmd bamshuf -Ou input.bam tmp-prefix | htscmd bam2fq -s se.fq.gz - | gzip > pe.fq.gz


Firstly it is much faster and more lightweight and secondly it is not affected by the input alignment sorting.

0
Entering edit mode

Hello, this command outputs a single file with the paired reads. Is there a parameter which could instruct this command to output two fastQ files, read1 and read2, ready for tophat? thank you

0
Entering edit mode
8.0 years ago
ph.henrich • 0

I stumbled over this thread and try to use htscmd from the htslib. On an Ubuntu 12.04 it does is compiled (v 0.2 -1.2), but only htsfile, tabix and bgzip. Are "htscmd" additional scripts not created during compiling?

I try to also extract unpaired reads from a bam file created with the samtools -f4 to extract unmapped reads. Picard; picard-tools simply does not work with the FU= parameter to ADDITIONALLY extract unpaired reads, which is why I would like to try the htslib as suggested by 'lh3'. Help is very appreciated.

Philipp

0
Entering edit mode
7.3 years ago
charizanisk ▴ 10

Hi guys,

I am getting a similar error. Here is what I'm trying to do.

I have a lot of data but I want to focus on specific areas because the software I am using is crashing.

So I go and use samtools to isolate certain coordinates:

samtools view whatever_sorted.bam chr1:115240000-115270000 chr12:25300000-25500000 chr3:178800000-178990000 chr7:55080000-55280000 > fetched.sam


I get rid of the reads that are unpaired (that means they should have the second pair aligned somewhere else in the genome) with

awk '{if ($7 == "=") print$0}' fetched.sam > fetchedpaired.sam


From 1124 unpaired reads according to picard I went down to 90. so I still get the same

SAM validation error: ERROR: Found 90 unpaired mates


when I run Picard SamToFastq

Can it be due to the fact that some reads will aligned on the same chromosome but away from the first read so even if the 7th column is still "=" the reads are considered unpaired??? Or is there something else going on? How can I fix that?

I need to end up having separate files for Read1 and Read2.

0
Entering edit mode
7.0 years ago

I ran into the same issue. Here is a solution: you can use samtools to filter for properly paired reads (samtools view -hf 0x2) , this way Picard SamToFastq will not complain about unpaired mates. Here is an example to pipe samtools with Picard to get Read1 and Read2 files for chr21 only.

samtools view -hf 0x2 Sample.sorted.bam chr21 | \
java -jar picard.jar SamToFastq I=/dev/stdin \
FASTQ=Sample_chr21_r1.fq  SECOND_END_FASTQ=Sample_chr21_r2.fq \
UNPAIRED_FASTQ=Sample_chr21_up.fq


Note even though I included UNPAIRED_FASTQ, this file will always be empty because samtools filtered out read pairs where only one read map to chr21.