Question: Splitting reference genome for alignment
1
gravatar for prasundutta87
4 weeks ago by
prasundutta87150
prasundutta87150 wrote:

Hi,

I am only interested in aligning DNAseq reads to certain genes. If I split my reference genome based on the coordinates of my gene of interest (as present in the GTF/GFF file) and then use BWA for aligning my reads to the resulting 'smaller reference genomes', will it be a good idea?

If yes, is there a threshold to the number of bases upstream and downstream of the gene coordinates that should be considered? And what caveats does this method involving splitting the reference genome can have that I should pay attention to?

My motto for using this method is to reduce alignment time as I am only interested in say 20-30 genes and not all genes.

alignment gene genome • 157 views
ADD COMMENTlink modified 4 weeks ago by Pierre Lindenbaum101k • written 4 weeks ago by prasundutta87150
2
gravatar for Pierre Lindenbaum
4 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum101k wrote:

for aligning my reads to the resulting 'smaller reference genomes', will it be a good idea?

NO, you'll get false positives. It's the same as : Exome Sequencing: Masking The Non-Genic Sequences ? (you're 'masking' a whole chromosome) . Citing Heng Li:

This will lead to wrongly mapped sequences, spurious SNPs/indels calls and all sorts of problems. I cannot think of a single use case when masking [before mapping] may lead to better outcomes."

ADD COMMENTlink written 4 weeks ago by Pierre Lindenbaum101k

I am only interested in aligning DNAseq reads to certain genes

what you can do is removing the reads after bwa and before sorting

bwa (...) | samtools view -L my.bed (...) | samtools sort (...)
ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum101k

Makes sense Pierre..Thank you for letting me know the caveat..

ADD REPLYlink written 4 weeks ago by prasundutta87150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 963 users visited in the last hour