Hi everyone,
I am struggling with getting GMAP to do what I want. I want to map cDNA sequences of very similar genes onto a genome, but end up with mappings spanning 2-5 times the size of the genes I am looking at. Therefore I tried to contain GMAPs behaviour and set --intronlength=10000, --split-large-introns and --totallength=15000. Unfortunately, this does not keep GMAP from showing me very large introns of up to 20 000 nt.
What am I doing wrong?
I kinda miss a good documentation for GMAP, as the help page is certainly short and the paper focuses on the algorithm itself.
Thanks for any help & all the best!
Can you confirm that it is mapping reads such that it spans 2 of those similar genes? or are they all valid single genes (so no cross matching?)
on a side note: 20kb intron size is not that large
I can confirm using a reference genome with fully annotated genes. I know of these reference genes that the genes I am looking for have a specific length, which is why I tried to contain the intron size. I suspect many genes in my region, but some annotations spanned the whole region.
Is it possible to map the region with a different tool, maybe? Or should I try gene prediction and subsequent comparison of the obtained sequences? I am very new to this, and feel that I miss an obvious point.
ok.
what I used to do to overcome this, is to split the reference in parts that roughly encompass each of those genes (CDS/mRNA + some context) and use that as reference for the mapping step.
there is however a serious drawback: it will not be able to perhaps find the optimal alignment as you only give it a subpart of the actual reference
thanks, i might just try that. However, the documentation to GMAP is in my eyes just not enough. is or was there any good source to learn more details about what the flags actually mean/ do? I found only dead links.
it's been quite a while since I last used GMAP but most of my knowledge comes from doing/trying things. Not very helpful thus :)
I guess I`ĺl have to try-and-error as well :)