Tutorial: Convert bam files to fastq in numbers as indicated in samtools flagstat stats.
3
3.6 years ago by
kirannbishwa011.2k
United States
kirannbishwa011.2k wrote:

One of the situation when bam to fastq conversion might be required:

Say, I have a problem where I want to take the reads that are aligned to chromosome 2 of one species. And, I want to see how much (fraction) of these same reads align to chr 2 of another species or population and with how much mismatches.

I used several method but wasn't able to get the number of reads in `fastq files` as indicated in `flagstat statistics` output. This bitwise method seems to be more appropriate, if people want to extract the exact amount of reads designated by `samtools flagstat`; but if others have different opinion please suggest.

``````Step 01: Select reads from Chromosome 2
\$ samtools view -h -b realigned_2ms04h.bam 2 > chr2.2ms04h.bam

# sort the bam alignment
\$ samtools sort -n chr2.2ms04h.bam chr2.sorted.2ms04h.bam

# fix mate pairs/info
\$ samtools fixmate chr2.sorted.2ms04h.bam chr2.fixmate.2ms04h.bam

# Let's check the simple statistics of the bam file
\$ samtools flagstat chr2.fixmate.2ms04h.bam > chr2.fixmate.stats

# Stats output from "chr2.fixmate.stats" file:
1421823 + 0 duplicates
7549722 + 0 mapped (100.00%:-nan%)
7549722 + 0 paired in sequencing
6936646 + 0 properly paired (91.88%:-nan%)
6937102 + 0 with itself and mate mapped
612620 + 0 singletons (8.11%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Step 02: Convert the mate-fixed bam to fastq using one of the below methods

# A) using bam2fastq
\$ bamToFastq -i chr2.fixmate.2ms04h.bam -fq chr2.2ms04h.R1 -fq2 chr2.2ms04h.R2

# B) using Picard
\$ java -jar picard2.9.jar SamToFastq I=chr2.fixmate.2ms04h.bam FASTQ=R1.fq SECOND_END_FASTQ = R2.fq UNPAIRED_FASTQ = unPaired.fq
``````

But, `bam2fastq and picard` didn't work for me.

So, contd... `Step 02 - C`

``````    If, you want to select only the properly paired reads use `bitwise flag -f 2`. if you want to remove duplicated reads use `bitwise flag -F 1024`
# 2-C (optional extra step) : Select properly paired (bitwise flag: -f 2) and remove duplicates (flag: -F 1024)
\$ samtools view -b -f 2 -F 1024 chr2.fixmate.2ms04h.bam > chr2.Paired_DeDup.bam
\$ samtools flagstat chr2.Paired_DeDup.bam > chr2.Paired_DeDup.stats

# Stats output from "chr2.Paired_DeDup.stats" file:
0 + 0 duplicates
5629430 + 0 mapped (100.00%:-nan%)
5629430 + 0 paired in sequencing
5629430 + 0 properly paired (100.00%:-nan%)
5629430 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

# 2-C:  Now, convert the bam files into Read1/2 matepairs, using bitwise flag
\$ samtools view -uf64 chr2.Paired_DeDup.bam |samtools bam2fq - |gzip > chr2.2ms04h_R1.fq.gz
\$ samtools view -uf128 chr2.Paired_DeDup.bam |samtools bam2fq - |gzip > chr2.2ms04h_R2.fq.gz

# Unzip the files and count the number of reads in R1, R2 read pairs
# Count
\$ cat chr2.2ms04h_R1.fq | echo \$((`wc -l`/4))
3061503
\$ cat chr2.2ms04h_R2.fq | echo \$((`wc -l`/4))
2567927
``````
• You can see that the number of reads reported by `samtools flagstat` and reads (R1 and R2) obtained using `bitwise method` are the same.
• If you didn't remove the the duplicates, singletons reads, those can also be extracted using bitwise flag and will be equal to the numbers reported in `samtools flagstat` file chr2.fixmate.stats.
written 3.6 years ago by kirannbishwa011.2k

Thanks for this tutorial! But do you know a way i can generate a file of only unpaired reads, based on the chr2.2ms04h_R1.fq and chr2.2ms04h_R2.fq files?