I have pair-end RNA-seq data (Illumina) from parasite and I would like to do De-Novo assembly by TRINITY. I have reference genome of my host organism so I can map my data to host and remove from fastq contaminations.
My plan is:
- Map with bwa/bowtie/novoaling my pair-end FASTQ files to a host reference genome
- Remove hits from fastq files (cleaning contaminations)
- For the rest of FASTQ files use TRINITY for De-Novo transcript assembly
My question is:
May I use aligners (bwa etc.) and align raw fastq files to host DNA and then remove contaminants from fastq files? Question is because my data are from RNA-seq project NOT DNA.
How can I remove the sequences from raw fastq files that align to host DNA (cleaning process)?
Or if you have any other advice how to prepare data to TRINITY pipeline I will appreciate it.
Thank you so much for any comment and sharing your experience.