Hi, I have a some ChIP-seq data sets (FASTQ files) that I want to prepare for peak calling. I did align the reads using
bowtie2 -x <bowtie_index> -1 <input_fastq_file> -2 <paired_input_fastq_file> -S <output_file.sam>, filtered the SAM file, converted to BAM and removed the PCR duplicates. I do need the files to be in BED format, but reading through the documentation of the peak calling tools that I want to use, they specify that the input files should be in BEDPE format if I want to perform the analysis on pair-ended reads. The question is: do I still need to convert to BEDPE if I already used BOWTIE2 in the pair-ended "mode"? I mean, I have the information about the mates in the BAM files, but when I convert to BEDPE I get a bunch of
****WARNING: Query ....... is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
The output of
samtools flagstat <input_file.bam> on one of the files passed through the aforementioned pipeline is:
47362874 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 47362874 + 0 mapped (100.00% : N/A) 47362874 + 0 paired in sequencing 24186824 + 0 read1 23176050 + 0 read2 47074699 + 0 properly paired (99.39% : N/A) 47148896 + 0 with itself and mate mapped 213978 + 0 singletons (0.45% : N/A) 44868 + 0 with mate mapped to a different chr 44695 + 0 with mate mapped to a different chr (mapQ>=5)
Thank you for your help.