I would like to find out if any reads in my .fastq file were transcribed from a vector sequence that is not in the human reference genome.
The sequence is available here: https://www.addgene.org/27080/sequences/#depositor-full
I thought I might try aligning the .fastq files to the sequence using TopHat, as if the sequence were a human reference genome, and seeing if any alignments pop up.
However, I'm not sure how to go about doing this.
Should I make the sequence above into a .gtf file somehow?
How do I make the corresponding annotation (.gff) file?
Is there an easier way to go about doing this? E.g., isolating every sequence in the .fastq file and using grep to search for it in the vector sequence?
EDIT: I am aligning Illumina RNA-Seq .fastq files. Also, I'd appreciate any resources folks have regarding how to modify or add chromosomes to a reference file!