preprocessing genome fasta file prior to mapping?
1
0
Entering edit mode
7.9 years ago
lstbl ▴ 40

Hi Everyone,

Sorry if this is a dup, but I can't seem to find a satisfactory answer on this site or others.

I'm wondering what, if any, pre-processing I should perform on a reference genome fasta/gff file prior to mapping using BWA or Bowtie. For example, if I wanted to map something to the orangutan genome, should I remove entries that are labeled as "unplaced/unlocalized genomic scaffold" from the gff and fasta files--i.e. only map to canonical chromosomes.

I notice that even these scaffolds have "BestRefSeq" categories in the gff file for genes, indicating that they still have useful information on them.

The reason I ask is because I was told by someone who no doubt knows much more than me about this stuff that I SHOULD remove these chromosomes. I'm wondering, however if this person is wrong.

Thanks!

next-gen bwa sequencing bowtie • 1.6k views
ADD COMMENT
3
Entering edit mode
7.9 years ago
GenoMax 141k

That is your choice.
If the sequence is known to belong to the genome (but is unplaced at the moment) then it should remain. You can ignore reads that align there if you are not interested in that region. Omitting the region from the reference may force an aligner to align those reads elsewhere, which you would probably do not want.

ADD COMMENT

Login before adding your answer.

Traffic: 1831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6