losing reads bam to fastq
1
0
Entering edit mode
3.9 years ago

Hi all, I am trying to realign a whole genome BAM file from one reference genome to another. Because the reference genome is updated. The process involves converting the name-sorted BAM file to fastq, then realigning the fastq to a new reference.

1. samtools sort -n input.bam -o input_n.sorted.bam
2. samtools fastq -1 output_r1.fq.gz -2 output_r2.fq.gz -0 output_0.fq input_n.sorted.bam

then map to ref genome:

1. bwa mem –t 12 -M -R "@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1" reference.fa raw_read1.fq.gz raw_read2.fq.gz > aligned_reads.sam
2. samtools view -S -b aligned_reads.sam > aligned_reads.bam
3. samtools sort aligned_reads.bam –o aligned_reads_sorted.bam

1. why the number of total reads decrease?
2. primary bam file:
103447197 + 0 in total (QC-passed reads + QC-failed reads)
437391 + 0 secondary
0 + 0 supplementary
763558 + 0 duplicates
102387760 + 0 mapped (98.98%:-nan%)
103009806 + 0 paired in sequencing
51504903 + 0 read1
51504903 + 0 read2
99225776 + 0 properly paired (96.33%:-nan%)
101576826 + 0 with itself and mate mapped
373543 + 0 singletons (0.36%:-nan%)
1967686 + 0 with mate mapped to a different chr
1253576 + 0 with mate mapped to a different chr (mapQ>=5)


new bam file:

103395186 + 0 in total (QC-passed reads + QC-failed reads)
385380 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
102673297 + 0 mapped (99.30% : N/A)
103009806 + 0 paired in sequencing
51504903 + 0 read1
51504903 + 0 read2
100273090 + 0 properly paired (97.34% : N/A)
102010764 + 0 with itself and mate mapped
277153 + 0 singletons (0.27% : N/A)
1151380 + 0 with mate mapped to a different chr
647241 + 0 with mate mapped to a different chr (mapQ>=5)


52011 reads have been lost

3. Should I do fixmate before sorting (5) or not?
4. Part of my data are mark duplicated bam files, I use the following process for them:
5. but lots of reads have been lost.

samtools flagstat for primary bam:

189134632 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
182864553 + 0 mapped (96.68% : N/A)
189134632 + 0 paired in sequencing
94520985 + 0 read1
94613647 + 0 read2
177873826 + 0 properly paired (94.05% : N/A)
181579511 + 0 with itself and mate mapped
1285042 + 0 singletons (0.68% : N/A)
3234229 + 0 with mate mapped to a different chr
2597267 + 0 with mate mapped to a different chr (mapQ>=5)

samtools view -c BL_72656_unmap_sort.bam
2817937

samtools view -c BL_72656_map_sort.bam
72385932


189134632 >>> (2817937+72385932)

Thanks in advance to anyone how can help me to understand if this procedure was right, I miss something, or just this is the output....

Cheers.

sequencing bam fastq fixmate whole genome • 1.8k views
1
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.

1
Entering edit mode

Not the source of your issue, but for your information you can simplify the steps below:

bwa mem –t 12 -M -R "@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1" reference.fa raw_read1.fq.gz raw_read2.fq.gz > aligned_reads.sam
samtools view -S -b aligned_reads.sam > aligned_reads.bam


To just this:

bwa mem –t 12 -M -R "@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1" reference.fa raw_read1.fq.gz raw_read2.fq.gz | samtools sort –o aligned_reads_sorted.bam


And as such save time, disk space and avoid intermediate files.

0
Entering edit mode

Thanks a lot for you kind help.

2
Entering edit mode
3.9 years ago

Hello,

the total number you see in the first line of flagstats is the total number of alignments. Due to secondary or supplementary alignments one read can have more than one alignment.

Saying this you have to subtract the secondary and supplementary alignments from the total to get the number of reads. Doing this you should see that the number is for both alignment the same.

fin swimmer

0
Entering edit mode

OK,Thanks very much for you patient explain.

0
Entering edit mode

Hello soleimani_homa,

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.