I mapped de novo assembled transcript to genome using GMAP but still large no. of transcripts showing similarity with draft genome sequences. My aim is it extract unique sequence which do not mapping to draft genome?
I have 32 paired end RNA seq libraries. Can I take all raw reads R1 and R2 reads in two separated files and is it possible to map those raw reads to Draft genome and to extract unmapped R1 and R2 reads in two separete files and followed by de novo assembly?
Please suggest which software should I use to complete this analysis?
Hi Brian,
Thanks for helping me.
I have downloaded BBmap. so before running cammand Do I need to do indexing of reference genome.
The command I gave will do indexing first, then map. Alternately you could do it in two steps, like this:
Either way gives the same result.
Hi Brian
Thanks
I got result as below, I have some queries:
1) ) During mapping how many mismatch it allowed, is there any option by which we can adjust the mismatch. 2) How it map splicing variant?
BBMap does not have a specific mismatch number. To quote @Brian from a recent answer:
Splice variants would be mappable based on the setting used for (maxindel and intronlen).
Hmmm, that was my mistake, for some reason I neglected to mention maxindel. When mapping RNA-seq data to a genome, maxindel is a useful flag to adjust; the default (16000) is fine for fungi and many plants, which have short introns, but for things like mammals which have long introns, I suggest setting adding the flags "maxindel=400000 intronlen=10". That allows mapping across introns of up to around 400 kbp or so. "intronlen" is normally unnecessary but may affect some downstream programs.
Thanks,
I want to map raw reads to CDS instead of genome than also should I run this with default setting?
Yes, for mapping RNA-seq reads to transcripts default settings are fine. You may want to add "ambig=all" because some transcripts have multiple isoforms.