losing reads bam to fastq
1
0
Entering edit mode
5.3 years ago

Hi all, I am trying to realign a whole genome BAM file from one reference genome to another. Because the reference genome is updated. The process involves converting the name-sorted BAM file to fastq, then realigning the fastq to a new reference.

  1. samtools sort -n input.bam -o input_n.sorted.bam
  2. samtools fastq -1 output_r1.fq.gz -2 output_r2.fq.gz -0 output_0.fq input_n.sorted.bam

then map to ref genome:

  1. bwa mem –t 12 -M -R "@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1" reference.fa raw_read1.fq.gz raw_read2.fq.gz > aligned_reads.sam
  2. samtools view -S -b aligned_reads.sam > aligned_reads.bam
  3. samtools sort aligned_reads.bam –o aligned_reads_sorted.bam

 

  1. why the number of total reads decrease?
  2. primary bam file:
     103447197 + 0 in total (QC-passed reads + QC-failed reads)
    437391 + 0 secondary
    0 + 0 supplementary
    763558 + 0 duplicates
    102387760 + 0 mapped (98.98%:-nan%)
    103009806 + 0 paired in sequencing
    51504903 + 0 read1
    51504903 + 0 read2
    99225776 + 0 properly paired (96.33%:-nan%)
    101576826 + 0 with itself and mate mapped
    373543 + 0 singletons (0.36%:-nan%)
    1967686 + 0 with mate mapped to a different chr
    1253576 + 0 with mate mapped to a different chr (mapQ>=5)
    

    new bam file:

    103395186 + 0 in total (QC-passed reads + QC-failed reads)
    385380 + 0 secondary
    0 + 0 supplementary
    0 + 0 duplicates
    102673297 + 0 mapped (99.30% : N/A)
    103009806 + 0 paired in sequencing
    51504903 + 0 read1
    51504903 + 0 read2
    100273090 + 0 properly paired (97.34% : N/A)
    102010764 + 0 with itself and mate mapped
    277153 + 0 singletons (0.27% : N/A)
    1151380 + 0 with mate mapped to a different chr
    647241 + 0 with mate mapped to a different chr (mapQ>=5)
    

    52011 reads have been lost

  3. Should I do fixmate before sorting (5) or not?
  4. Part of my data are mark duplicated bam files, I use the following process for them:
  5. but lots of reads have been lost.

    samtools flagstat for primary bam:

    189134632 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 secondary
    0 + 0 supplementary
    0 + 0 duplicates
    182864553 + 0 mapped (96.68% : N/A)
    189134632 + 0 paired in sequencing
    94520985 + 0 read1
    94613647 + 0 read2
    177873826 + 0 properly paired (94.05% : N/A)
    181579511 + 0 with itself and mate mapped
    1285042 + 0 singletons (0.68% : N/A)
    3234229 + 0 with mate mapped to a different chr
    2597267 + 0 with mate mapped to a different chr (mapQ>=5)
    
    samtools view -c BL_72656_unmap_sort.bam
    2817937
    
    samtools view -c BL_72656_map_sort.bam
    72385932
    

    189134632 >>> (2817937+72385932)

    Thanks in advance to anyone how can help me to understand if this procedure was right, I miss something, or just this is the output....

    Cheers.

sequencing bam fastq fixmate whole genome • 2.6k views
ADD COMMENT
2
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

ADD REPLY
1
Entering edit mode

Not the source of your issue, but for your information you can simplify the steps below:

bwa mem –t 12 -M -R "@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1" reference.fa raw_read1.fq.gz raw_read2.fq.gz > aligned_reads.sam
samtools view -S -b aligned_reads.sam > aligned_reads.bam
samtools sort aligned_reads.bam –o aligned_reads_sorted.bam

To just this:

bwa mem –t 12 -M -R "@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1" reference.fa raw_read1.fq.gz raw_read2.fq.gz | samtools sort –o aligned_reads_sorted.bam

And as such save time, disk space and avoid intermediate files.

ADD REPLY
0
Entering edit mode

Thanks a lot for you kind help.

ADD REPLY
3
Entering edit mode
5.3 years ago

Hello,

the total number you see in the first line of flagstats is the total number of alignments. Due to secondary or supplementary alignments one read can have more than one alignment.

Saying this you have to subtract the secondary and supplementary alignments from the total to get the number of reads. Doing this you should see that the number is for both alignment the same.

fin swimmer

ADD COMMENT
0
Entering edit mode

OK,Thanks very much for you patient explain.

ADD REPLY
0
Entering edit mode

Hello soleimani_homa,

Don't forget to follow up on your threads.

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work. Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 3232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6