I concatenated all the fasta files into one file and created a database using makeblastdb. I have reads in fastq file which I believe has been contaminated with bacterial sequences. Since I have made the database, how can I use blastn to query this database and will this give me read counts for all the reads mapping to a particular specific bacterial species?
Hi, for that I have index my combined genome which is around 25G and has 12690 contigs. Do you think aligner would be fine indexing it and then mapping it?
While feasible using Blast for this will be a fairly compute intensive task (and can take a significant amount of time). If you believe that your data is contaminated then I suggest that you use BBSplit from BBMap suite with known genome and collect reads that map to that genome leaving the contamination behind in a different pool.
There are more efficient ways of assigning bacterial species (e.g. kraken) if you want to continue on present path.
Can you be specific about mapping what to what? BBMap should handle a genome that large AFAIK. You will need to have enough RAM available (for reference human genome generally needs about 30G).
I created a reference fasta file for all the bacterial species and combined them into one fasta file. Taking this fasta file as genome, I want to map my paired-end RNA-Seq reads to this fasta file to see how many reads are coming from which bacterial species. Since I combined all the bacterial species, total size of the fasta file is 25G which is very big(bigger than human genome)
See my comment above for feasibility of using that fasta for analysis. Should be possible as long as you have enough RAM available.
Otherwise this is a question of priorities. Are you interested in moving forward with the reads that map to the genome of your interest (which I assume is not of bacterial origin) or is your interest the bacterial reads themselves. There is also the 16S microbial blast index that NCBI makes available. Presumably your data will have 16S reads. Blasting a sample of reads can provide you with some information about the bacterial contamination.
This microbial blast index might give me some idea quickly. To use this microbial blast index, I should just use blastn with my fastq reads on this index? Thanks genomax for the help
You can convert your fastq reads to fasta using reformat.sh in=your_reads.fq out=your_reads.fa from BBMap suite and then on to blast. (use in1= in2= out1= out2= for a PE dataset).
Are they paired-end fastq? Are you going to do any QC prior to looking for contamination?
You can align your reads to the reference fasta using bwa.
Hi, for that I have index my combined genome which is around 25G and has 12690 contigs. Do you think aligner would be fine indexing it and then mapping it?