Question: Splitting reference genome for alignment
1
gravatar for prasundutta87
11 months ago by
prasundutta87200
prasundutta87200 wrote:

Hi,

I am only interested in aligning DNAseq reads to certain genes. If I split my reference genome based on the coordinates of my gene of interest (as present in the GTF/GFF file) and then use BWA for aligning my reads to the resulting 'smaller reference genomes', will it be a good idea?

If yes, is there a threshold to the number of bases upstream and downstream of the gene coordinates that should be considered? And what caveats does this method involving splitting the reference genome can have that I should pay attention to?

My motto for using this method is to reduce alignment time as I am only interested in say 20-30 genes and not all genes.

alignment gene genome • 415 views
ADD COMMENTlink modified 11 months ago by Pierre Lindenbaum112k • written 11 months ago by prasundutta87200
2
gravatar for Pierre Lindenbaum
11 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

for aligning my reads to the resulting 'smaller reference genomes', will it be a good idea?

NO, you'll get false positives. It's the same as : Exome Sequencing: Masking The Non-Genic Sequences ? (you're 'masking' a whole chromosome) . Citing Heng Li:

This will lead to wrongly mapped sequences, spurious SNPs/indels calls and all sorts of problems. I cannot think of a single use case when masking [before mapping] may lead to better outcomes."

ADD COMMENTlink written 11 months ago by Pierre Lindenbaum112k

I am only interested in aligning DNAseq reads to certain genes

what you can do is removing the reads after bwa and before sorting

bwa (...) | samtools view -L my.bed (...) | samtools sort (...)
ADD REPLYlink written 11 months ago by Pierre Lindenbaum112k

Makes sense Pierre..Thank you for letting me know the caveat..

ADD REPLYlink written 11 months ago by prasundutta87200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1597 users visited in the last hour