Post mapping extract query sequence mapped to each gene feature
0
0
Entering edit mode
10 months ago
L_bioinfo • 0

I did a mapping on my query reads to the genome using bwa. I wish to extract the query sequence that matched with the gene feature. I found column 10 in samtools to provide information on the query sequence that matched to genome (which in many cases covers outside of gene feature).

Since, I am interested in extracting the query base sequence mapped only to the gene feature. Is there a possible way to perform I have been looking in seqkit locate and grep but they aren't feasible.

genome    mappedcoordinate_start    mappedcoordinate_end     query ID    gene_ID    gene_start      gene_end
Xgenome    26                       3000                     read_1      gene_1        30      740

I want to extract the query sequence corresponding to chr:30:740 instead chr:26-3000. I have 2 million hence I need a tool can't do this manually.

I need the query sequence as we decided to probe on it.

Appreciate all help.

samtools seqkit • 733 views
ADD COMMENT
0
Entering edit mode

. I wish to extract the query sequence that matched with the gene feature

What do you mean ?

ADD REPLY
0
Entering edit mode

The mapping was performed between the query reads and the reference genome. I have long read sequences and they cover more than one gene at time within the genome. I want to extract query sequences that mapped to each particular gene.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

If you are interested in extracting all query sequences that fall within a region (gene) then you can try this: Isolating reads from specific region from bam file

samtools ampliconclip may also be applicable if you want to truncate the reads (http://www.htslib.org/doc/samtools-ampliconclip.html )

ADD REPLY

Login before adding your answer.

Traffic: 1746 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6