Hello all,
I'm doing a metagenomic analysis on human samples that were sequenced by Miseq platform, paired end reads, 40 to 300pb long (it varies according to FastQC report). After the trim/quality control fase, i ended up with reads of 60 to 190 bp long, but those paired end reads aren't able to join anymore. :(
I thought about 2 solutions:
- Consider all reads (R1 and R2) single end reads, concatenate them all and proceed with the metagenomic analysis with QIIME, or;
- Use only forward (R1) or reverse (R2) reads for the downstream analysis with QIIME.
Is there a 3th and better option for it? Does any of the solutions i thought about make sense?
Thank you all very much for any help or tips you can provide me.
Fernanda Luz
You should always merge the reads first (you can adapter trim but NOT quality trim). After merging the reads you could quality trim if needed. BBMerge from BBMap suite is what I recommend that you try for merging unelss you are following an established pipeline line QIIME or Mothur.
Thank you for your reply, i will give it a try with BBMerge on the raw reads. I tried to join the reads before trimming them, but i got less than 10% joined reads (i used fastq-join from QIIME pipeline).
If you expect the reads to join (the way you designed your amplicons) and if they are not then something may be wrong with the libraries. If your inserts are short and you have a lot of adapter contamination then you are going to get short reads after adapters are removed. They should overlap but will not give you long reads you expect.
Another problem is that i don't have the information about the library preparation, as the size of the overlap region between R1 and R2. I used fastq-join in the default mode. According to FastQC report, i don't have adapter contamination, just a really bad score at the end of the reads.
Did you have a chance to try
bbmerge.sh
? Curious to see if it helps improve things.I'm going to try it today. As soon as i get a result, i will let you know. Thank you :)
Start with the original data for first
bbduk
try. Do not worry about bad quality at ends of reads.I ran bbmerge.sh today in one of my samples and got this report:
Not looking good right?
About the bbduk.sh, i don't have the contaminant files for the 'ref' parameter.
For the adapters (if that is what you meant by contaminants) the sequences are available in a file called
adapters.fa
in theresources
directory in BBMap software bundle.You have a high percentage of
Ambiguous: 45310 83.225%
overlaps. That could signify that there may be simple repeats likeACACACACAAC
at the ends of reads which make overlapping them difficult.BTW: You don't appear to have many reads.
54443
is relatively a tiny amount of data. Is that correct?Unfortunately is it correct, my data is really small.
I ran bbduk and it only removed 6 reads from my data. Then i reran bbmerge, as expected, got no change at those 6% of merged reads.
It does look like a little like a lost cause right? Should i consider it a single end reads data and continue the analysis with only R1 reads?
At least we can be sure that there is likely no adapter contamination your data.
Can you take a sample of reads (from R1/R2 file) and blast them at NCBI? Confirm that they are retrieving hits to the right sequences. It is possible that something went wrong somewhere in the experimental procedure and you don't have the right kind of data.
Hey Fernanda, Is there any particular reason why the data is of poor quality (according to FastQC)? Are you running FastQC with default parameters? It seems a waste to throw away such long reads.
Thank you for your reply Kevin. I ran FastQC with default parameters, yes.
This is a real example of one of my libraries' FastQC report (just the per base sequence quality):
R1 example
R2 example
Wow, the quality is very poor at the 3` end. Yes, now I can understand the trimming to ~190bp max. I think that the quality is poorer than it should be. You should check to ensure that nothing was at error during the sequencing process.
The other comments (acima/above) também te ajudarão.