Hi all, I have been working on the metagenomics analysis on a microbial community. It was sequenced on Illumina Nextera sequence, 2x 150bp. I have obtained low map reading quality (properly paired ~23%), and I would like to know what can be done to raise the mapping quality. Here is what I have done:
- Used trimmomatic to trim away low quality reads
- Concatenate forward files into a single forward file (fastq), and the same for reverse files
- FastQC to check quality
- IDBA-UD for assembly (mink = 20, maxk = 100)
- Bowtie2 for mapping, then samtools flagstat for statistics, but obtained low quality mapping (~23%)
The flagstat looks like this:
41641174 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
18804112 + 0 mapped (45.16% : N/A)
41641174 + 0 paired in sequencing
20820587 + 0 read1
20820587 + 0 read2
9847366 + 0 properly paired (23.65% : N/A)
17193728 + 0 with itself and mate mapped
1610384 + 0 singletons (3.87% : N/A)
1906784 + 0 with mate mapped to a different chr
1813704 + 0 with mate mapped to a different chr (mapQ>=5)
I am not sure what caused the low mapping quality, do I need to concatenate files first then use trimmomatic? As this might have caused forward and reverse file losing reads at different position (eg. Forward losing bases at position 1000, but reverse reads losing bases at position 9000).
What could I do to raise the mapping quality?
Cheers and thank
Alan
I am not sure how trimmomatic works but for IDBA-UD usually fastq reads are combined into a fasta file. IDBA-UD expects that the reads are in the same order in both files prior to combining them in a fasta file. Did you check that reads are in the same order in both fastq files before combining them into a fasta file for IDBA-UD input? If you miss this step, IDBA-UD would not generate good quality assemblies.
Thank you Sej, I have checked my scripts and I have checked that the reads are in the same order.
I have concatenated the samples as below
cat S1_F.fastq S2_F.fastq S3_F.fastq S4_F.fastq
Then for reverse reads
cat S1_R.fastq S2_R.fastq S3_R.fastq S4_R.fastq
That looks fine to me, for the concatenation step, at least.
You are sending that cat'ed results to a new file, correct?
cat S1_F.fastq S2_F.fastq S3_F.fastq S4_F.fastq > total_F.fastq
I tend to use fq2fa utility provided in the idba_ud suite with --merge option as recommended step to get fasta from paired fastq in two separate file. I am not sure if cat formats the fasta in the same way as fq2fa does, it's worth checking that though.
In the example above
cat
is only concatenating the files together. It isnot
changing the format from fastq --> fasta.That's correct and also
cat
would not format the sequences in the expected input format for IDBA_UD assemblyfq2fa --merge
would generate output fasta file where reads from file_1.fq and file_2.fq are orderedIn metagenomics, you should check your assembly before going into alignment.