I am working on a collaboration and we recently generated some RNA-seq data. So the lab that we are working with inserted a human gene into the Drosophila genome to see the effects of a specific mutation. They were able to insert the gene and did some RNA sequencing. I was given the task of working with the RNA-seq data but I have no idea how to find this gene that was inserted. I've worked with RNA-seq data in the past but never like this. How an I suppose to find the gene? I know there are programs out there to find novel isoforms found in RNA-seq data but what do you do when you're looking for an entire gene.
I thought about trying to somehow add the human gene into the Drosophila index? but the index is very specific about gene location right?
I think this is an easy question but I'm not sure where to begin here. Any help would be amazing.
Thanks!
Just add sequence of human gene to the drosophila genome as a new "chromosome" and analyze as before. You will have to re-create your own aligner indexes with this
modified genome
.How would I go about doing that? I'm not sure how to alter the transcript fasta file to do that?
Put your human gene sequence in fasta format in a file. Do
cat transcript.fa human.fa > mod_transcript.fa
. Build indexes usingmod_transcript.fa
file.If you just want to see if there are reads expressed from the human gene then you could do something like this, using
bbduk.sh
from BBMap.Reads going into file
matched.fq.gz
will be for human gene.I will give that a try! Thanks for the help