Question

finding gene sequence from WGS data

0

Entering edit mode

5 months ago

analyst ▴ 70

I have performed variant calling and annotation analysis from WGS data. Now I need to get sequences of few genes containing variants.

How can I get sequences of particular variant containing genes?

gene WGS • 1.0k views

ADD COMMENT • link updated 5 months ago by swbarnes2 15k • written 5 months ago by analyst ▴ 70

0

Entering edit mode

I know this isn't the exact answer to your question but the most common workflow to see the impact of variants on genes is to run a variant effect predictor

ADD REPLY • link 5 months ago by cmdcolin ★ 4.4k

2

Entering edit mode

OP seems to have at least tried doing this (if it the same data): building snpeff database for plant

ADD REPLY • link 5 months ago by GenoMax 154k

0

Entering edit mode

Yes I have done variant annotation.

Its rice data, I used available rice database from snpEff.

My PI wants to perform structural analysis too like comparing structure of normal gene structure with annotated gene containing variant.

Therefore I would need to extract gene sequences for only 3 or 4 genes from our WGS data.

Your guidance is highly appreciated.

Thankyou!

ADD REPLY • link 5 months ago by analyst ▴ 70

score 2 · Answer 1 · 2025-05-27

2

Entering edit mode

5 months ago

swbarnes2 15k

It's tricky to get full sequences out of a bam, so your best bet is to make a fixed consensus sequence using your original reference fasta and your vcf.

ADD COMMENT • link 5 months ago by swbarnes2 15k

0

Entering edit mode

Thanks swbarnes2!

I need to extract the sequences of 3 or 4 genes only not all genes

ADD REPLY • link 5 months ago by analyst ▴ 70

2

Entering edit mode

It's probably simpler to just make the whole altered consensus, then pick out what you want, instead of only making the consensus for 4 regions. You can also then realign to that consensus and see if your genes of interest look good in IGV.

ADD REPLY • link 5 months ago by swbarnes2 15k