Hi all,
I have Illumina reads (36bp, single end) from mRNA sequencing of human samples in two different conditions. I want to find the probable effective simple sequence repeat (SSR) markers between experimental conditions. Since the genome is available, I mapped reads to the reference genome (global alignment) and extracted the consensus sequence from bam file. I considered the lowest level for insertion/deletion cost during mapping, please advise me another useful option for mapping to this end. However, the consensus sequence was full of N, referring no read mapped to that region. Could you please let me know if I should determine the SSR on this consensus sequence or you have alternative suggestions and comments for SSR discovering in RNA-seq data when a reference genome is available?
Thanks
Thanks, friend. The samples resulted from healthy and disease humans, unfortunately, we have not the genome sequence of the diseased group. Although RNA-seq data may not be the most optimal for SSR discovery, here just this kind of data is available, any suggestions!
Assumed SSR identified from the reference genome, how I find which SSR are actually expressed?
You would need to write a program.
Sorry, could you please explain more?
That's not very helpful.
WouterDeCoster, as you suggested, assumed SSR identified from the reference genome, how I find which SSR are actually expressed?
When you have generated a bed file of SSR locations you can use bedtools to find intervals in which reads are present in the alignment.