I did a mapping on my query reads to the genome using bwa. I wish to extract the query sequence that matched with the gene feature. I found column 10 in samtools to provide information on the query sequence that matched to genome (which in many cases covers outside of gene feature).
Since, I am interested in extracting the query base sequence mapped only to the gene feature. Is there a possible way to perform I have been looking in seqkit locate and grep but they aren't feasible.
genome mappedcoordinate_start mappedcoordinate_end query ID gene_ID gene_start gene_end Xgenome 26 3000 read_1 gene_1 30 740
I want to extract the query sequence corresponding to chr:30:740 instead chr:26-3000. I have 2 million hence I need a tool can't do this manually.
I need the query sequence as we decided to probe on it.
Appreciate all help.