Hi, I would like to only unique mapped paired-end reads. Reads which are not anymore in pair I would like to remove. After the the below steps the BAM file did not contain any reads anymore:
- aligned pair-end data with bowtie2.
- Removed duplicates with MarkDuplicates
- kept aligned reads only with mapping score above 40 with samtools
- fixed read information with
- filtered only proper paired with samtools
usage: sh unique_reads_dedup_pbs.sh /path/to/FQdata
for r1 in $(find $1 -name "*.bam"); do
#cat <<eof qsub="" <<eof<="" p="">
!/bin/bash -l
PBS -N $r1
PBS -l walltime=43:00:00
PBS -j oe
PBS -l mem=190G
PBS -l ncpus=2
PBS -m bea
cd \$PBS_O_WORKDIR
conda activate picard picard -Xmx490g MarkDuplicates ASSUME_SORT_ORDER=coordinate REMOVE_DUPLICATES=true INPUT=$r1 OUTPUT=${r1}.dedup.bam METRICS_FILE=${r1}.dedup.metrics.txt conda deactivate
conda activate hisat2 samtools index ${r1}.dedup.bam conda deactivate
conda activate hisat2 samtools view -Sb -q 40 ${r1}.dedup.bam > ${r1}.dedup.q40.bam samtools index ${r1}.dedup.q40.bam conda deactivate
conda activate picard TMP_DIR=
pwd
/tmp mkdir \$TMP_DIR echo \$TMP_DIR picard -Xmx190g FixMateInformation \ I=${r1} \ O=${r1}.dedup.q40.FixMate.bam \ ASSUME_SORTED=false \ SORT_ORDER=coordinate conda deactivateconda activate hisat2 samtools view -@ 12 -hb -F 2317 ${r1}.dedup.q40.FixMate.bam > ${r1}.dedup.q40.FixMate.clean.bam samtools index ${r1}.dedup.q40.FixMate.clean.bam conda deactivate
EOF
done
Please below samtools stats for step the last 2,3 and 4 steps:
sorted.bam.stats
SN raw total sequences: 1513528
SN filtered sequences: 0
SN sequences: 1513528
SN is sorted: 1
SN 1st fragments: 756764
SN last fragments: 756764
SN reads mapped: 1497136
SN reads mapped and paired: 1491776 # paired-end technology bit set + both mates mapped
SN reads unmapped: 16392
SN reads properly paired: 1148588 # proper-pair bit set
SN reads paired: 1513528 # paired-end technology bit set
SN reads duplicated: 0 # PCR or optical duplicate bit set
SN reads MQ0: 16039 # mapped and MQ=0
SN reads QC failed: 0
SN non-primary alignments: 0
SN total length: 151123824 # ignores clipping
SN total first fragment length: 75561845 # ignores clipping
SN total last fragment length: 75561979 # ignores clipping
SN bases mapped: 149485975 # ignores clipping
SN bases mapped (cigar): 149485975 # more accurate
SN bases trimmed: 0
SN bases duplicated: 0
SN mismatches: 741385 # from NM fields
SN error rate: 4.959562e-03 # mismatches / bases mapped (cigar)
SN average length: 99
SN average first fragment length: 100
SN average last fragment length: 100
SN maximum length: 100
SN maximum first fragment length: 100
SN maximum last fragment length: 100
SN average quality: 37.5
SN insert size average: 383.0
SN insert size standard deviation: 68.9
SN inward oriented pairs: 673761
SN outward oriented pairs: 832
SN pairs with other orientation: 181
SN pairs on different chromosomes: 71114
SN percentage of properly paired reads (%): 75.9
sorted.bam.dedup.q40.bam.stats
SN raw total sequences: 731647
SN filtered sequences: 0
SN sequences: 731647
SN is sorted: 1
SN 1st fragments: 366565
SN last fragments: 365082
SN reads mapped: 731647
SN reads mapped and paired: 730807 # paired-end technology bit set + both mates mapped
SN reads unmapped: 0
SN reads properly paired: 595732 # proper-pair bit set
SN reads paired: 731647 # paired-end technology bit set
SN reads duplicated: 0 # PCR or optical duplicate bit set
SN reads MQ0: 0 # mapped and MQ=0
SN reads QC failed: 0
SN non-primary alignments: 0
SN total length: 73152156 # ignores clipping
SN total first fragment length: 36650131 # ignores clipping
SN total last fragment length: 36502025 # ignores clipping
SN bases mapped: 73152156 # ignores clipping
SN bases mapped (cigar): 73152156 # more accurate
SN bases trimmed: 0
SN bases duplicated: 0
SN mismatches: 244351 # from NM fields
SN error rate: 3.340312e-03 # mismatches / bases mapped (cigar)
SN average length: 99
SN average first fragment length: 100
SN average last fragment length: 100
SN maximum length: 100
SN maximum first fragment length: 100
SN maximum last fragment length: 100
SN average quality: 37.6
SN insert size average: 410.3
SN insert size standard deviation: 60.9
SN inward oriented pairs: 349296
SN outward oriented pairs: 39
SN pairs with other orientation: 14
SN pairs on different chromosomes: 16054
SN percentage of properly paired reads (%): 81.4
sorted.bam.dedup.q40.FixMate.bam.stats
SN raw total sequences: 1513528
SN filtered sequences: 0
SN sequences: 1513528
SN is sorted: 1
SN 1st fragments: 756764
SN last fragments: 756764
SN reads mapped: 1497136
SN reads mapped and paired: 1491776 # paired-end technology bit set + both mates mapped
SN reads unmapped: 16392
SN reads properly paired: 1148588 # proper-pair bit set
SN reads paired: 1513528 # paired-end technology bit set
SN reads duplicated: 0 # PCR or optical duplicate bit set
SN reads MQ0: 16039 # mapped and MQ=0
SN reads QC failed: 0
SN non-primary alignments: 0
SN total length: 151123824 # ignores clipping
SN total first fragment length: 75561845 # ignores clipping
SN total last fragment length: 75561979 # ignores clipping
SN bases mapped: 149485975 # ignores clipping
SN bases mapped (cigar): 149485975 # more accurate
SN bases trimmed: 0
SN bases duplicated: 0
SN mismatches: 741385 # from NM fields
SN error rate: 4.959562e-03 # mismatches / bases mapped (cigar)
SN average length: 99
SN average first fragment length: 100
SN average last fragment length: 100
SN maximum length: 100
SN maximum first fragment length: 100
SN maximum last fragment length: 100
SN average quality: 37.5
SN insert size average: 432.6
SN insert size standard deviation: 66.2
SN inward oriented pairs: 673761
SN outward oriented pairs: 832
SN pairs with other orientation: 181
SN pairs on different chromosomes: 71114
SN percentage of properly paired reads (%): 75.9
Why the last command caused the BAM does not contain any reads?
Thank you in advance,