What is the best short reader to very large reference fasta?
1
0
Entering edit mode
3 months ago
O.rka ▴ 350

I have 88 metagenomes, assembled all individually, dereplicated contigs, and now I want to map 91 metatranscriptomes to this very large reference to see what doesn't map. These reads that don't map, I want to save into separate fastq files. I'm dealing with bacteria so there are no splicing events happening.

What tool would you suggest to do the following:

• Build the index

• Map the metaT to the large metaG reference

• Convert the sam to fastq for only paired sequences that do not map to reference

My original plan was to use BWA for the mapping, pipe to bbmap's (or bbsuite? or bbtools?) reformat.sh program to convert from sam, only get unmapped reads, and output fastq. The BWA index took about 7 hours to make last time I tried this (before I realized there were a lot of duplicate sequences). I'm going to try this again later today and I'm wondering if I should try BWA or Bowtie2 or maybe something else.

Any help or feedback is appreciated.

mapping alignment fastq bowtie2 bwa read • 215 views
0
Entering edit mode
3 months ago
GenoMax 106k

You can simply use bbmap.sh to do the alignments (it is as capable as any aligner out there) and then save reads that don't map using outu= (without writing the alignments, if you don't want to save them i.e. don't use out=). No need to do any conversions. I think outu1= and outu2= should keep the reads in separate files if you have paired-end data.

0
Entering edit mode

I'll give this a try. A few questions regarding bbmap:

1. How should I refer to the collection of programs that are installed (I never know)? Is it bbtools?
2. I get a little confused with specifying a prebuilt reference index. Other aligners usually add a suffix to the input reference while bbmap uses a separate folder. Do I just specify that folder?
1
Entering edit mode
1. BBTools refers to the entire suite.
2. If you have a pre-built reference for bbmap then specify that top level folder (which should contain ref folder) using path=. If you are starting with a fasta reference file then ref= is correct option, which builds the index on the fly.
0
Entering edit mode

Got it! That's what I was missing as I wasn't using the ref= and path= option correctly