Tutorial:Convert bam files to fastq in numbers as indicated in samtools flagstat stats.
Entering edit mode
6.9 years ago
kirannbishwa01 ★ 1.6k

One of the situation when bam to fastq conversion might be required:

Say, I have a problem where I want to take the reads that are aligned to chromosome 2 of one species. And, I want to see how much (fraction) of these same reads align to chr 2 of another species or population and with how much mismatches.

I used several method but wasn't able to get the number of reads in fastq files as indicated in flagstat statistics output. This bitwise method seems to be more appropriate, if people want to extract the exact amount of reads designated by samtools flagstat; but if others have different opinion please suggest.

Step 01: Select reads from Chromosome 2
         $ samtools view -h -b realigned_2ms04h.bam 2 > chr2.2ms04h.bam

       # sort the bam alignment
         $ samtools sort -n chr2.2ms04h.bam chr2.sorted.2ms04h.bam

       # fix mate pairs/info
         $ samtools fixmate chr2.sorted.2ms04h.bam chr2.fixmate.2ms04h.bam

       # Let's check the simple statistics of the bam file
         $ samtools flagstat chr2.fixmate.2ms04h.bam > chr2.fixmate.stats

          # Stats output from "chr2.fixmate.stats" file:
          7549722 + 0 in total (QC-passed reads + QC-failed reads)
          1421823 + 0 duplicates
          7549722 + 0 mapped (100.00%:-nan%)
          7549722 + 0 paired in sequencing
          3771662 + 0 read1
          3778060 + 0 read2
          6936646 + 0 properly paired (91.88%:-nan%)
          6937102 + 0 with itself and mate mapped
          612620 + 0 singletons (8.11%:-nan%)
          0 + 0 with mate mapped to a different chr
          0 + 0 with mate mapped to a different chr (mapQ>=5)

Step 02: Convert the mate-fixed bam to fastq using one of the below methods

       # A) using bam2fastq 
         $ bamToFastq -i chr2.fixmate.2ms04h.bam -fq chr2.2ms04h.R1 -fq2 chr2.2ms04h.R2

       # B) using Picard
         $ java -jar picard2.9.jar SamToFastq I=chr2.fixmate.2ms04h.bam FASTQ=R1.fq SECOND_END_FASTQ = R2.fq UNPAIRED_FASTQ = unPaired.fq

But, bam2fastq and picard didn't work for me.

So, contd... Step 02 - C

    If, you want to select only the properly paired reads use `bitwise flag -f 2`. if you want to remove duplicated reads use `bitwise flag -F 1024`
        # 2-C (optional extra step) : Select properly paired (bitwise flag: -f 2) and remove duplicates (flag: -F 1024) 
        $ samtools view -b -f 2 -F 1024 chr2.fixmate.2ms04h.bam > chr2.Paired_DeDup.bam
        $ samtools flagstat chr2.Paired_DeDup.bam > chr2.Paired_DeDup.stats 

        # Stats output from "chr2.Paired_DeDup.stats" file:
        5629430 + 0 in total (QC-passed reads + QC-failed reads)
        0 + 0 duplicates
        5629430 + 0 mapped (100.00%:-nan%)
        5629430 + 0 paired in sequencing
        3061503 + 0 read1
        2567927 + 0 read2
        5629430 + 0 properly paired (100.00%:-nan%)
        5629430 + 0 with itself and mate mapped
        0 + 0 singletons (0.00%:-nan%)
        0 + 0 with mate mapped to a different chr
        0 + 0 with mate mapped to a different chr (mapQ>=5)

         # 2-C:  Now, convert the bam files into Read1/2 matepairs, using bitwise flag
        $ samtools view -uf64 chr2.Paired_DeDup.bam |samtools bam2fq - |gzip > chr2.2ms04h_R1.fq.gz
        $ samtools view -uf128 chr2.Paired_DeDup.bam |samtools bam2fq - |gzip > chr2.2ms04h_R2.fq.gz

         # Unzip the files and count the number of reads in R1, R2 read pairs
         # Count
        $ cat chr2.2ms04h_R1.fq | echo $((`wc -l`/4))
        $ cat chr2.2ms04h_R2.fq | echo $((`wc -l`/4))
  • You can see that the number of reads reported by samtools flagstat and reads (R1 and R2) obtained using bitwise method are the same.
  • If you didn't remove the the duplicates, singletons reads, those can also be extracted using bitwise flag and will be equal to the numbers reported in samtools flagstat file chr2.fixmate.stats.
samtools bam alignment fastq sequence • 2.3k views
Entering edit mode

Thanks for this tutorial! But do you know a way i can generate a file of only unpaired reads, based on the chr2.2ms04h_R1.fq and chr2.2ms04h_R2.fq files?


Login before adding your answer.

Traffic: 1347 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6