I have several RNAseq BAM files (mapped with BWA and GATK IndelRealigner) of cancer patients of which I should compare if there any copynumber in repeats appeared (whole transcriptome). I need to find a pipeline to find repeats in with our repeat detection software which needs FASTA and protein sequences as input.
Now I would like to have each patient's personal transcriptome as Fasta format which I could only feed to our software. What do I do now? For example, can I use Blastx, which compares directly to a reference proteome? Does this pay attention to the reading-frame? How do I get a Fasta output with each protein? In this pipeline, it is important to pay special attention to repeats that deviate from the reference.
Many thanks for your help!