Mapping whole genome sequencing of several strains of organism A to gene X

1

Entering edit mode

3.6 years ago

bagheri • 0

Hi every one. I got stuck in my NGS project and was wondering if someone can advise me.

I have downloaded Illumina paired-end reads of the whole genome of several M. tuberculosis strains from ENA database. I have a gene of 1800bp length (gene X) from M. tuberculosis. I want to investigate the SNPs variation of gene X among different strains of M. tuberculosis.

After evaluating the read quality, and trimming the reads, is it ok to map illumina DNA-seq reads against one single gene? According to the literature, all the reads are mapped against the whole genome (ref genome), and not a single gene while based on my project, I have to map the whole genome to a single gene! Any suggestion is greatly appreciated.

sequence alignment SNP gene • 592 views

ADD COMMENT • link updated 3.6 years ago by Ram 43k • written 3.6 years ago by bagheri • 0

0

Entering edit mode

Yes, you will need to map reads to the whole genome (or exome or other targeted panel) - this is determined by the sequencing library source for the reads. You've downloaded Whole Genome Sequencing data, so you'll need to align to the entire genome. Anything else would be inaccurate. Once you align the reads and call variants, you can zero in on your gene of interest for further downstream analysis.

ADD REPLY • link 3.6 years ago by Ram 43k

0

Entering edit mode

@RamRS Many thanks for your response. I will follow your advice and will map all the reads against the ref genome, and call the SNPs. How can i zero in on my gene of interest?

ADD REPLY • link 3.6 years ago by bagheri • 0

0

Entering edit mode

Annotate your VCF with VEP (or extract the gene region from the VCF and annotate just that). That will give you all the information you need.