Question: losing reads bam to fastq
0
gravatar for soleimani_homa
7 months ago by
soleimani_homa0 wrote:

Hi all, I am trying to realign a whole genome BAM file from one reference genome to another. Because the reference genome is updated. The process involves converting the name-sorted BAM file to fastq, then realigning the fastq to a new reference.

  1. samtools sort -n input.bam -o input_n.sorted.bam
  2. samtools fastq -1 output_r1.fq.gz -2 output_r2.fq.gz -0 output_0.fq input_n.sorted.bam

then map to ref genome:

  1. bwa mem –t 12 -M -R "@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1" reference.fa raw_read1.fq.gz raw_read2.fq.gz > aligned_reads.sam
  2. samtools view -S -b aligned_reads.sam > aligned_reads.bam
  3. samtools sort aligned_reads.bam –o aligned_reads_sorted.bam

 

  1. why the number of total reads decrease?
  2. primary bam file:
     103447197 + 0 in total (QC-passed reads + QC-failed reads)
    437391 + 0 secondary
    0 + 0 supplementary
    763558 + 0 duplicates
    102387760 + 0 mapped (98.98%:-nan%)
    103009806 + 0 paired in sequencing
    51504903 + 0 read1
    51504903 + 0 read2
    99225776 + 0 properly paired (96.33%:-nan%)
    101576826 + 0 with itself and mate mapped
    373543 + 0 singletons (0.36%:-nan%)
    1967686 + 0 with mate mapped to a different chr
    1253576 + 0 with mate mapped to a different chr (mapQ>=5)
    

    new bam file:

    103395186 + 0 in total (QC-passed reads + QC-failed reads)
    385380 + 0 secondary
    0 + 0 supplementary
    0 + 0 duplicates
    102673297 + 0 mapped (99.30% : N/A)
    103009806 + 0 paired in sequencing
    51504903 + 0 read1
    51504903 + 0 read2
    100273090 + 0 properly paired (97.34% : N/A)
    102010764 + 0 with itself and mate mapped
    277153 + 0 singletons (0.27% : N/A)
    1151380 + 0 with mate mapped to a different chr
    647241 + 0 with mate mapped to a different chr (mapQ>=5)
    

    52011 reads have been lost

  3. Should I do fixmate before sorting (5) or not?
  4. Part of my data are mark duplicated bam files, I use the following process for them:
  5. but lots of reads have been lost.

    samtools flagstat for primary bam:

    189134632 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 secondary
    0 + 0 supplementary
    0 + 0 duplicates
    182864553 + 0 mapped (96.68% : N/A)
    189134632 + 0 paired in sequencing
    94520985 + 0 read1
    94613647 + 0 read2
    177873826 + 0 properly paired (94.05% : N/A)
    181579511 + 0 with itself and mate mapped
    1285042 + 0 singletons (0.68% : N/A)
    3234229 + 0 with mate mapped to a different chr
    2597267 + 0 with mate mapped to a different chr (mapQ>=5)
    
    samtools view -c BL_72656_unmap_sort.bam
    2817937
    
    samtools view -c BL_72656_map_sort.bam
    72385932
    

    189134632 >>> (2817937+72385932)

    Thanks in advance to anyone how can help me to understand if this procedure was right, I miss something, or just this is the output....

    Cheers.

ADD COMMENTlink modified 7 months ago by finswimmer12k • written 7 months ago by soleimani_homa0
1

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

ADD REPLYlink written 7 months ago by RamRS23k
1

Not the source of your issue, but for your information you can simplify the steps below:

bwa mem –t 12 -M -R "@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1" reference.fa raw_read1.fq.gz raw_read2.fq.gz > aligned_reads.sam
samtools view -S -b aligned_reads.sam > aligned_reads.bam
samtools sort aligned_reads.bam –o aligned_reads_sorted.bam

To just this:

bwa mem –t 12 -M -R "@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1" reference.fa raw_read1.fq.gz raw_read2.fq.gz | samtools sort –o aligned_reads_sorted.bam

And as such save time, disk space and avoid intermediate files.

ADD REPLYlink modified 7 months ago • written 7 months ago by WouterDeCoster40k

Thanks a lot for you kind help.

ADD REPLYlink written 7 months ago by soleimani_homa0
2
gravatar for finswimmer
7 months ago by
finswimmer12k
Germany
finswimmer12k wrote:

Hello,

the total number you see in the first line of flagstats is the total number of alignments. Due to secondary or supplementary alignments one read can have more than one alignment.

Saying this you have to subtract the secondary and supplementary alignments from the total to get the number of reads. Doing this you should see that the number is for both alignment the same.

fin swimmer

ADD COMMENTlink modified 7 months ago • written 7 months ago by finswimmer12k

OK,Thanks very much for you patient explain.

ADD REPLYlink written 7 months ago by soleimani_homa0

Hello soleimani_homa,

Don't forget to follow up on your threads.

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work. Upvote|Bookmark|Accept

ADD REPLYlink written 7 months ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 692 users visited in the last hour