Question: Creating Hg19 Reference Index
1
gravatar for Davy
6.8 years ago by
Davy360
United States
Davy360 wrote:

I am creating the index reference in order to align my fastq reads. One question I have is should I include all the "funny" chromosomes like the chr6hapcox or chrUnxxxxxxx?

I will be using GATK downstream for other parts of the analysis, and I know it doesn't like when the chromosomes are out of order, so what should I do as regards these non-canonical chromosomes which don't have (at least to me) an implicit ordering?

next-gen bwa hg19 • 3.9k views
ADD COMMENTlink written 6.8 years ago by Davy360
8
gravatar for Pierre Lindenbaum
6.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

See Heng Li 's page : http://lh3lh3.users.sourceforge.net/humanref.shtml

For variant discovery, RNA-seq and ChIP-seq, it is recommended to use the entire primary assembly, including assembled chromosomes AND unlocalized/unplaced contigs, for the purpose of read mapping. Not including unlocalized and unplaced contigs potentially leads to more mapping errors.

ADD COMMENTlink written 6.8 years ago by Pierre Lindenbaum118k

I realise this, but the problem is it will cause GATK to throw an error when parsing the BAM files. Any ideas on how to get GATK to not break?

ADD REPLYlink written 6.8 years ago by Davy360

you can map with your favorite tool and filter the funny hits from the SAM/BAM file after that.

ADD REPLYlink written 6.8 years ago by JC7.6k
0
gravatar for Rok
6.8 years ago by
Rok180
Trondheim, Norway
Rok180 wrote:

If you have your chromosome in separate fasta files you should merge everything into one fasta file (whole genome). Using Create Sequence Dictionary from Picard Tools you create a dictionary for this whole genome. This is going to store order of chromosomes in the whole genome file.

When you do mapping with a whole genome fasta file as an index most of the mapping software should produce SAM/BAM header that is the same as the dictionary file. If it happens to be different (TopHat sometimes seems to sort chromosomes a bit differently) you can use ReorderSam from Picard tools to sort SAM/BAM file in the same fashion as it's in the dictionary, be careful you provide ReorderSam with correct dictionary file.

ADD COMMENTlink written 6.8 years ago by Rok180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 769 users visited in the last hour