Entering edit mode
4.6 years ago
O.rka ▴ 660
BBMap to isolate
r2 reads while providng unmatched reads as well. It generated a single file. It is wise to use this in my assembler (
SPAdes) as single-ended?
I ended up using the
outm=./bbmap_output/mapped.fq from below as input into
single ended with the
-s flag. My
mapped.fq file has not only paired
r2 reads but also the
bbwrap.sh -Xmx40g in=./reads/r1.fq,$SINGLETONS in2=./reads/r2.fq,null outm=./bbmap_output/mapped.fq outu=./bbmap_output/unmapped.fq ref=./reference/assembly.fa out=./bbmap_output/output.sam lengthtag=t idtag=t covstats=./bbmap_output/output.covstats.txt rpkm=./bbmap_output/output.rpkm.txt threads=$N_JOBS usemodulo append
For SPAdes, my command was the following:
python spades-3.9.0/bin/spades.py -t $N_JOBS -s ./bbmap_output/mapped.fq -o ./spades_output/
Which tool from BBMap did you use?
reformat.sh? If yes then you must have made a interleaved reads file.
You can look at SPAdes manual section 3.1 on how to specify interleaved reads.
@genomax thank you. I've added my command above in the original description. Are they still interlaced if it includes both paired r1/r2 and the unpaired reads?
bbmap.shto align your data to
assembly.fa. You obtained a SAM format file
output.samfrom that alignment, which is no longer in the fastq format but is in SAM format. You seem to have captured reads that did not align in
unmapped.fqand that is probably in the interleaved format since you started with PE reads but provided only single output file name.
If you wanted to assemble the data then there is no need to align it first. You should start with your scanned/trimmed original data files and then go into
SPAdesdirectly after that.
Note: If this question is a related to/follow-up on Extracted mapped shotgun metagenomic reads to reference genome. SPAdes or metaSPAdes for de-novo assembly? then you are on the right track. You can retrieve reads that mapped to your assembly by doing
You can then use these reads in your SPAdes assembly.
Apologies, I forgot to add the
bbwrap.shcommand that I ended up using. I've updated the question with the correct command and my input into spades. Can
reformat.shcreate singletons? If I was able to regenerate the cleaned
singletons.fqfastq files, would assembling these with
spades.py -1 r1.fq -2 r2.fq -s singletons.fqresult in a better assembly than
spades.py -s mapped.fqwhere
mapped.fqhas everything merged into one?
Unless you have a really high number of singletons, I suggest that you ignore them for now and try an assembly with properly paired reads.
GenoMax Hi, sorry to reply on such an old post but I have a similar question. The kneaddata output has unmatched_1.fq and unmatched_2.fq which are reads whose mates are lost but they themselves passed both trimmomatic and bowtie2 step. In this case would at what step would have reads without mates be an issue in downstream processing? Thanks in advance
You will need to provide some context. What exactly are you trying to do?
My apologies, I wish to use these sequences for taxonomic and functional profiling (plan to use metaphlan and humann) and also to assemble using megahit/metaspades. I wish to know if using those unmatched reads will affect these steps?
I am not sure which steps you are referring to. If programs you are planning to use can use the singleton reads then you would be able to use them. If they require paired-end data then these reads are not going to be useful.