Tutorial:Convert bam files to fastq in numbers as indicated in samtools flagstat stats.
0
3
Entering edit mode
6.9 years ago
kirannbishwa01 ★ 1.6k

One of the situation when bam to fastq conversion might be required:

Say, I have a problem where I want to take the reads that are aligned to chromosome 2 of one species. And, I want to see how much (fraction) of these same reads align to chr 2 of another species or population and with how much mismatches.

I used several method but wasn't able to get the number of reads in fastq files as indicated in flagstat statistics output. This bitwise method seems to be more appropriate, if people want to extract the exact amount of reads designated by samtools flagstat; but if others have different opinion please suggest.

Step 01: Select reads from Chromosome 2
samtools view -h -b realigned_2ms04h.bam 2 > chr2.2ms04h.bam # sort the bam alignment samtools sort -n chr2.2ms04h.bam chr2.sorted.2ms04h.bam

# fix mate pairs/info
$samtools fixmate chr2.sorted.2ms04h.bam chr2.fixmate.2ms04h.bam # Let's check the simple statistics of the bam file$ samtools flagstat chr2.fixmate.2ms04h.bam > chr2.fixmate.stats

# Stats output from "chr2.fixmate.stats" file:
1421823 + 0 duplicates
7549722 + 0 mapped (100.00%:-nan%)
7549722 + 0 paired in sequencing
6936646 + 0 properly paired (91.88%:-nan%)
6937102 + 0 with itself and mate mapped
612620 + 0 singletons (8.11%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Step 02: Convert the mate-fixed bam to fastq using one of the below methods

# A) using bam2fastq
$bamToFastq -i chr2.fixmate.2ms04h.bam -fq chr2.2ms04h.R1 -fq2 chr2.2ms04h.R2 # B) using Picard$ java -jar picard2.9.jar SamToFastq I=chr2.fixmate.2ms04h.bam FASTQ=R1.fq SECOND_END_FASTQ = R2.fq UNPAIRED_FASTQ = unPaired.fq


But, bam2fastq and picard didn't work for me.

So, contd... Step 02 - C

    If, you want to select only the properly paired reads use bitwise flag -f 2. if you want to remove duplicated reads use bitwise flag -F 1024
# 2-C (optional extra step) : Select properly paired (bitwise flag: -f 2) and remove duplicates (flag: -F 1024)
$samtools view -b -f 2 -F 1024 chr2.fixmate.2ms04h.bam > chr2.Paired_DeDup.bam$ samtools flagstat chr2.Paired_DeDup.bam > chr2.Paired_DeDup.stats

# Stats output from "chr2.Paired_DeDup.stats" file:
0 + 0 duplicates
5629430 + 0 mapped (100.00%:-nan%)
5629430 + 0 paired in sequencing
5629430 + 0 properly paired (100.00%:-nan%)
5629430 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

# 2-C:  Now, convert the bam files into Read1/2 matepairs, using bitwise flag
$samtools view -uf64 chr2.Paired_DeDup.bam |samtools bam2fq - |gzip > chr2.2ms04h_R1.fq.gz$ samtools view -uf128 chr2.Paired_DeDup.bam |samtools bam2fq - |gzip > chr2.2ms04h_R2.fq.gz

# Unzip the files and count the number of reads in R1, R2 read pairs
# Count
$cat chr2.2ms04h_R1.fq | echo$((wc -l/4))
3061503
$cat chr2.2ms04h_R2.fq | echo$((wc -l/4))
2567927

• You can see that the number of reads reported by samtools flagstat and reads (R1 and R2) obtained using bitwise method are the same.
• If you didn't remove the the duplicates, singletons reads, those can also be extracted using bitwise flag and will be equal to the numbers reported in samtools flagstat file chr2.fixmate.stats.
samtools bam alignment fastq sequence • 2.3k views
0
Entering edit mode

Thanks for this tutorial! But do you know a way i can generate a file of only unpaired reads, based on the chr2.2ms04h_R1.fq and chr2.2ms04h_R2.fq files?