Entering edit mode
8.8 years ago
seta
★
1.9k
Hi all,
I'm working on a plant RNA-seq analysis, I plan to check my assembly against whole nr or nt databases to detect any common contamination like Homo sapiens and Escherichia coli DNA, mitochondrial and chloroplast sequences as well as rRNA. As you all know blasting against nr or nt takes too much time, so I prefer to use blat. Please put here your experience about using blat to this end as I did not find much information for this purpose, I knew UCSC, but I'm looking for your command to make database and run it.
Thanks
Why don't you make a database that contains only common contaminants? You can also make a custom database with the plant genome or related plant genomes, and then only blast the remainder of contigs against nt
What's your proposed way to collect above-mentioned common contaminants to make database?
With normally > 90-95% reads that map to the reference, in which I include the mitochondrial genome, I don't bother much. If there is contamination, it is mostly from the host genome (as expected), so I check - only the remainder of reads- against the salmon genome, which normally gives good coverage. There is also sometimes a small percentage of phage sequences for sequencing error assessment in the samples. I would also include the chloroplast genome in your case, I don't see plastid genomes as contamination. I haven't checked for human sequences, but I guess they work sterile in our lab.