Question: preprocessing genome fasta file prior to mapping?
gravatar for lstbl
4.2 years ago by
lstbl40 wrote:

Hi Everyone,

Sorry if this is a dup, but I can't seem to find a satisfactory answer on this site or others.

I'm wondering what, if any, pre-processing I should perform on a reference genome fasta/gff file prior to mapping using BWA or Bowtie. For example, if I wanted to map something to the orangutan genome, should I remove entries that are labeled as "unplaced/unlocalized genomic scaffold" from the gff and fasta files--i.e. only map to canonical chromosomes.

I notice that even these scaffolds have "BestRefSeq" categories in the gff file for genes, indicating that they still have useful information on them.

The reason I ask is because I was told by someone who no doubt knows much more than me about this stuff that I SHOULD remove these chromosomes. I'm wondering, however if this person is wrong.


sequencing bowtie bwa next-gen • 987 views
ADD COMMENTlink written 4.2 years ago by lstbl40
gravatar for genomax
4.2 years ago by
United States
genomax87k wrote:

That is your choice.
If the sequence is known to belong to the genome (but is unplaced at the moment) then it should remain. You can ignore reads that align there if you are not interested in that region. Omitting the region from the reference may force an aligner to align those reads elsewhere, which you would probably do not want.

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by genomax87k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1510 users visited in the last hour