Validate the assembly of unmapped reads by remapping the original reads back to the assembled sequence
Entering edit mode
7 weeks ago
Sony ▴ 10

Hello everyone,

I have paired end reads whole genome sequencing data of Brassica varieties and reference genome. My objective is focus on unmapped reads. Here is my workflow:

  1. Check quality of raw paired end reads sequencing data with FastQC. Removed adapter sequence and low quality bases with Trimmomatic.
  2. Mapped trimmed paired end reads with reference genome using BAW-mem
  3. Converted SAM to BAM, sorted BAM file.
  4. Extracted all unmapped reads and converted to fatsq file samtools view -b -f 4 SRR4289357_mapped.sorted.bam > SRR4289357_unmapped.bam samtools sort SRR4289357_unmapped.bam > SRR4289357_unmapped.sorted.bam samtools bam2fq SRR4289357_unmapped.sorted.bam > SRR4289357_unmapped.sorted.fastq
  5. Calculated average insert size, stdev of average insert size (bbmap) and JF-SIZE for configuration file of MaSuRCA.
  6. Assembly with MaSuRCA, and here is stats of assembled sequence: enter image description here
  7. Summarize statistics of assembly using QUAST: enter image description here
    1. Validate the assembly by remapping reads back to the assembled sequence of extracted unmapped reads (based on this tutorial: ), following these steps on this tutorial. My expectation of remapping trimmed paired end reads with assembled sequence of extracted unmapped reads is: “ very few of the reads do not map back to the contigs and the high rate of reads are properly paired which indicate that there are not too many mis-assemblies.” Here is mapping statistic when I remapped trimmed paired end reads with assembled sequence from MaSuRCA. enter image description here

Based on my results, only 1.19% reads are mapped back to the contigs, and only 0.71 properly paired. This result is not look like expectation on the tutorial that I mentioned earlier

Is my results is normal? Or I wrong somewhere?

unmapped_reads. assembly. Paired-end_reads • 232 views
Entering edit mode

Please do not post screen shots of text material. These are hard to see (some of us have old eyes). You can copy and paste the text content and then format using 101010 button.

If these unmapped reads are random contamination then they are not likely to assemble into anything meanful.


Login before adding your answer.

Traffic: 2657 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6