I have an RNA-Seq assembly. Human reads were aligned and removed before assembly (with hisat2), however some reads made it to the assembly stage and were assembled into contigs. I am interested in the non-human contigs. What is the best way to filter out human contigs from the assembly in a sensitive and specific way while leaving non-human contigs?
Options I have considered:
- Blast - unmapped contigs
- Blat - same using pslReps to filter by coverage
- Spaln or gmap?
When testing with a small sample of human transcripts (the first 5000 in the Ensembl fasta), I found that both Blast and Blat with minCoverage=90 yielded poor sensitivity (only ~3500 filtered out). Any better ideas?