Hello,
I am trying to analyse a published pair-end ChIP-seq data. However, most of reads info are missing after running samtools fixmate. For example:
After sorting bam file by reads name, the report of samtools flagstat
shows that:
13431858 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
12770824 + 0 mapped (95.08% : N/A)
13431858 + 0 paired in sequencing
6715929 + 0 read1
6715929 + 0 read2
8381710 + 0 properly paired (62.40% : N/A)
12560308 + 0 with itself and mate mapped
210516 + 0 singletons (1.57% : N/A)
269058 + 0 with mate mapped to a different chr
50930 + 0 with mate mapped to a different chr (mapQ>=5)
then, after running samtools fixmate -m in.bam out.bam
and samtools flagstat out.bam
. I got following info:
13431858 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
12770824 + 0 mapped (95.08% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I have processed my own pair-end ChIP-seq data and all the reads info are same as before running fixmate.
Is there any thing I need to process differently when dealing with published data?
The published data is obtained by fastq-dump -I --split-files SRRXXXX
and mapped to genome use Bowtie2 with default settings. My samtools version is 1.7
Thanks,
Please show the full command lines and the full SRR number. I assume something in the sort command went wrong, probably sorted by coordinate, rather than name.
The bam file is sorted by name for sure.
samtools sort -n -T temp -o out.bam in.bam
I have tried several published pair-end data and all have this problem. For example: SRR1848385 My total command lines list as follow: