Question: Error in samtools fixmate when dealing with published pair-end data
1
gravatar for jamwest101
11 months ago by
jamwest10110
jamwest10110 wrote:

Hello,

I am trying to analyse a published pair-end ChIP-seq data. However, most of reads info are missing after running samtools fixmate. For example: After sorting bam file by reads name, the report of samtools flagstat shows that:

13431858 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
12770824 + 0 mapped (95.08% : N/A)
13431858 + 0 paired in sequencing
6715929 + 0 read1
6715929 + 0 read2
8381710 + 0 properly paired (62.40% : N/A)
12560308 + 0 with itself and mate mapped
210516 + 0 singletons (1.57% : N/A)
269058 + 0 with mate mapped to a different chr
50930 + 0 with mate mapped to a different chr (mapQ>=5)

then, after running samtools fixmate -m in.bam out.bam and samtools flagstat out.bam. I got following info:

13431858 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
12770824 + 0 mapped (95.08% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

I have processed my own pair-end ChIP-seq data and all the reads info are same as before running fixmate. Is there any thing I need to process differently when dealing with published data? The published data is obtained by fastq-dump -I --split-files SRRXXXX and mapped to genome use Bowtie2 with default settings. My samtools version is 1.7

Thanks,

samtools chip-seq • 549 views
ADD COMMENTlink modified 11 months ago by ATpoint26k • written 11 months ago by jamwest10110

Please show the full command lines and the full SRR number. I assume something in the sort command went wrong, probably sorted by coordinate, rather than name.

ADD REPLYlink modified 11 months ago • written 11 months ago by ATpoint26k

The bam file is sorted by name for sure. samtools sort -n -T temp -o out.bam in.bam I have tried several published pair-end data and all have this problem. For example: SRR1848385 My total command lines list as follow:

./fastq-dump -I --split-files SRR1848385
bowtie2 -x tair10 -1 SRR1848385_1.fastq -2 SRR1848385_2.fastq -S SRR1848385.sam
samtools -bS SRR1848385.sam > SRR1848385.bam
samtools sort -n -T temp -o SRR1848385_namesort.bam SRR1848385.bam
samtools fixmate -m SRR1848385_namesort.bam SRR1848385_fixmate.bam
ADD REPLYlink written 11 months ago by jamwest10110
4
gravatar for ATpoint
11 months ago by
ATpoint26k
Germany
ATpoint26k wrote:

The problem seems to be that using option -I in fastq-dump appends a .1 and .2 to the read name in the fastq files.

fastq-dump --split-files SRR1848385.sra && head -n 1 *.fastq

==> SRR1848385_1.fastq <==
@SRR1848385.1 HISEQ:115:C49JYACXX:1:1101:1708:2225 length=100

==> SRR1848385_2.fastq <==
@SRR1848385.1 HISEQ:115:C49JYACXX:1:1101:1708:2225 length=100

fastq-dump --split-files -I SRR1848385.sra && head -n 1 *.fastq

==> test_1.fq <==
@SRR1848385.1.1 HISEQ:115:C49JYACXX:1:1101:1708:2225 length=100

==> test_2.fq <==
@SRR1848385.1.2 HISEQ:115:C49JYACXX:1:1101:1708:2225 length=100

As a consequence, fixmate interpretes the read names as non-matching and therefore assumes they do not belong to one pair, rendering them unpaired. Try converting to fastq without -I. Also, there is no need to sort by name right after alignment, as paired-end reads are by default ordered by name in the fastq files (and therefore in the SAM file). You can also pipe everything to avoid intermediate files:

bowtie2 -x tair10 -1 SRR1848385_1.fastq -2 SRR1848385_2.fastq | \
  samtools fixmate -m - SRR1848385_fixmate.bam
ADD COMMENTlink modified 11 months ago • written 11 months ago by ATpoint26k

Using -F option with fastq-dump to recover original Illumina fastq headers may be preferable. That removes the @SRR1848385 id from read headers.

ADD REPLYlink modified 11 months ago • written 11 months ago by genomax75k

Thank you for the help!

ADD REPLYlink written 11 months ago by jamwest10110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2115 users visited in the last hour