no unique mapped paired-end reads in BAM file
0
0
Entering edit mode
4.2 years ago
Ric ▴ 430

Hi, I would like to only unique mapped paired-end reads. Reads which are not anymore in pair I would like to remove. After the the below steps the BAM file did not contain any reads anymore:

  1. aligned pair-end data with bowtie2.
  2. Removed duplicates with MarkDuplicates
  3. kept aligned reads only with mapping score above 40 with samtools
  4. fixed read information with
  5. filtered only proper paired with samtools

usage: sh unique_reads_dedup_pbs.sh /path/to/FQdata

for r1 in $(find $1 -name "*.bam"); do

#cat <<eof qsub="" &lt;<eof<="" p="">

!/bin/bash -l

PBS -N $r1

PBS -l walltime=43:00:00

PBS -j oe

PBS -l mem=190G

PBS -l ncpus=2

PBS -m bea

cd \$PBS_O_WORKDIR

conda activate picard picard -Xmx490g MarkDuplicates ASSUME_SORT_ORDER=coordinate REMOVE_DUPLICATES=true INPUT=$r1 OUTPUT=${r1}.dedup.bam METRICS_FILE=${r1}.dedup.metrics.txt conda deactivate

conda activate hisat2 samtools index ${r1}.dedup.bam conda deactivate

conda activate hisat2 samtools view -Sb -q 40 ${r1}.dedup.bam > ${r1}.dedup.q40.bam samtools index ${r1}.dedup.q40.bam conda deactivate

conda activate picard TMP_DIR=pwd/tmp mkdir \$TMP_DIR echo \$TMP_DIR picard -Xmx190g FixMateInformation \ I=${r1} \ O=${r1}.dedup.q40.FixMate.bam \ ASSUME_SORTED=false \ SORT_ORDER=coordinate conda deactivate

conda activate hisat2 samtools view -@ 12 -hb -F 2317 ${r1}.dedup.q40.FixMate.bam > ${r1}.dedup.q40.FixMate.clean.bam samtools index ${r1}.dedup.q40.FixMate.clean.bam conda deactivate

EOF

done

Please below samtools stats for step the last 2,3 and 4 steps:

sorted.bam.stats

SN      raw total sequences:    1513528
SN      filtered sequences:     0
SN      sequences:      1513528
SN      is sorted:      1
SN      1st fragments:  756764
SN      last fragments: 756764
SN      reads mapped:   1497136
SN      reads mapped and paired:        1491776 # paired-end technology bit set + both mates mapped
SN      reads unmapped: 16392
SN      reads properly paired:  1148588 # proper-pair bit set
SN      reads paired:   1513528 # paired-end technology bit set
SN      reads duplicated:       0       # PCR or optical duplicate bit set
SN      reads MQ0:      16039   # mapped and MQ=0
SN      reads QC failed:        0
SN      non-primary alignments: 0
SN      total length:   151123824       # ignores clipping
SN      total first fragment length:    75561845        # ignores clipping
SN      total last fragment length:     75561979        # ignores clipping
SN      bases mapped:   149485975       # ignores clipping
SN      bases mapped (cigar):   149485975       # more accurate
SN      bases trimmed:  0
SN      bases duplicated:       0
SN      mismatches:     741385  # from NM fields
SN      error rate:     4.959562e-03    # mismatches / bases mapped (cigar)
SN      average length: 99
SN      average first fragment length:  100
SN      average last fragment length:   100
SN      maximum length: 100
SN      maximum first fragment length:  100
SN      maximum last fragment length:   100
SN      average quality:        37.5
SN      insert size average:    383.0
SN      insert size standard deviation: 68.9
SN      inward oriented pairs:  673761
SN      outward oriented pairs: 832
SN      pairs with other orientation:   181
SN      pairs on different chromosomes: 71114
SN      percentage of properly paired reads (%):        75.9

sorted.bam.dedup.q40.bam.stats

SN      raw total sequences:    731647
SN      filtered sequences:     0
SN      sequences:      731647
SN      is sorted:      1
SN      1st fragments:  366565
SN      last fragments: 365082
SN      reads mapped:   731647
SN      reads mapped and paired:        730807  # paired-end technology bit set + both mates mapped
SN      reads unmapped: 0
SN      reads properly paired:  595732  # proper-pair bit set
SN      reads paired:   731647  # paired-end technology bit set
SN      reads duplicated:       0       # PCR or optical duplicate bit set
SN      reads MQ0:      0       # mapped and MQ=0
SN      reads QC failed:        0
SN      non-primary alignments: 0
SN      total length:   73152156        # ignores clipping
SN      total first fragment length:    36650131        # ignores clipping
SN      total last fragment length:     36502025        # ignores clipping
SN      bases mapped:   73152156        # ignores clipping
SN      bases mapped (cigar):   73152156        # more accurate
SN      bases trimmed:  0
SN      bases duplicated:       0
SN      mismatches:     244351  # from NM fields
SN      error rate:     3.340312e-03    # mismatches / bases mapped (cigar)
SN      average length: 99
SN      average first fragment length:  100
SN      average last fragment length:   100
SN      maximum length: 100
SN      maximum first fragment length:  100
SN      maximum last fragment length:   100
SN      average quality:        37.6
SN      insert size average:    410.3
SN      insert size standard deviation: 60.9
SN      inward oriented pairs:  349296
SN      outward oriented pairs: 39
SN      pairs with other orientation:   14
SN      pairs on different chromosomes: 16054
SN      percentage of properly paired reads (%):        81.4

sorted.bam.dedup.q40.FixMate.bam.stats

SN      raw total sequences:    1513528
SN      filtered sequences:     0
SN      sequences:      1513528
SN      is sorted:      1
SN      1st fragments:  756764
SN      last fragments: 756764
SN      reads mapped:   1497136
SN      reads mapped and paired:        1491776 # paired-end technology bit set + both mates mapped
SN      reads unmapped: 16392
SN      reads properly paired:  1148588 # proper-pair bit set
SN      reads paired:   1513528 # paired-end technology bit set
SN      reads duplicated:       0       # PCR or optical duplicate bit set
SN      reads MQ0:      16039   # mapped and MQ=0
SN      reads QC failed:        0
SN      non-primary alignments: 0
SN      total length:   151123824       # ignores clipping
SN      total first fragment length:    75561845        # ignores clipping
SN      total last fragment length:     75561979        # ignores clipping
SN      bases mapped:   149485975       # ignores clipping
SN      bases mapped (cigar):   149485975       # more accurate
SN      bases trimmed:  0
SN      bases duplicated:       0
SN      mismatches:     741385  # from NM fields
SN      error rate:     4.959562e-03    # mismatches / bases mapped (cigar)
SN      average length: 99
SN      average first fragment length:  100
SN      average last fragment length:   100
SN      maximum length: 100
SN      maximum first fragment length:  100
SN      maximum last fragment length:   100
SN      average quality:        37.5
SN      insert size average:    432.6
SN      insert size standard deviation: 66.2
SN      inward oriented pairs:  673761
SN      outward oriented pairs: 832
SN      pairs with other orientation:   181
SN      pairs on different chromosomes: 71114
SN      percentage of properly paired reads (%):        75.9

Why the last command caused the BAM does not contain any reads?

Thank you in advance,

alignment ngs sequencing • 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6