Entering edit mode
6.7 years ago
friedrichlab
•
0
I have a large .BAM file containing cleaned reads of RNA Seq data from a partnered sequencing project. My goal is to take existing protein sequences and align them to the RNA Seq data in order to identify corresponding sequence regions. I have looked into using multiple resources to do this (BLAST command line, genome workbench, picard), but have not found much success as my bioinformatics experience only goes so far as R, muscle, and the online BLAST tool. I was wondering if anyone had any recommendations or experience with something like this that they could offer.
Thank you in advance.
You should do the alignments the other way around. Use the reads to map against protein DB. DIAMOND is an option and will require significant hardware to be available.
Why are you going this route though? Is the data already aligned to a reference? If yes then you should be able to see what the genes are.
Do you want to annotate the genome, or to annotate the transcriptome? Or do you want to do something else entirely? You are describing the task you want to perform, but not the goal you want to achieve. As this task seems a bit odd, maybe if you tell what is your ultimate goal, someone can be more helpful.