Question: Creating Hg19 Reference Index
1
gravatar for Davy
8.5 years ago by
Davy370
United States
Davy370 wrote:

I am creating the index reference in order to align my fastq reads. One question I have is should I include all the "funny" chromosomes like the chr6hapcox or chrUnxxxxxxx?

I will be using GATK downstream for other parts of the analysis, and I know it doesn't like when the chromosomes are out of order, so what should I do as regards these non-canonical chromosomes which don't have (at least to me) an implicit ordering?

next-gen bwa hg19 • 4.5k views
ADD COMMENTlink written 8.5 years ago by Davy370
8
gravatar for Pierre Lindenbaum
8.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

See Heng Li 's page : http://lh3lh3.users.sourceforge.net/humanref.shtml

For variant discovery, RNA-seq and ChIP-seq, it is recommended to use the entire primary assembly, including assembled chromosomes AND unlocalized/unplaced contigs, for the purpose of read mapping. Not including unlocalized and unplaced contigs potentially leads to more mapping errors.

ADD COMMENTlink written 8.5 years ago by Pierre Lindenbaum131k

I realise this, but the problem is it will cause GATK to throw an error when parsing the BAM files. Any ideas on how to get GATK to not break?

ADD REPLYlink written 8.5 years ago by Davy370

you can map with your favorite tool and filter the funny hits from the SAM/BAM file after that.

ADD REPLYlink written 8.5 years ago by JC12k
0
gravatar for Rok
8.5 years ago by
Rok190
Trondheim, Norway
Rok190 wrote:

If you have your chromosome in separate fasta files you should merge everything into one fasta file (whole genome). Using Create Sequence Dictionary from Picard Tools you create a dictionary for this whole genome. This is going to store order of chromosomes in the whole genome file.

When you do mapping with a whole genome fasta file as an index most of the mapping software should produce SAM/BAM header that is the same as the dictionary file. If it happens to be different (TopHat sometimes seems to sort chromosomes a bit differently) you can use ReorderSam from Picard tools to sort SAM/BAM file in the same fashion as it's in the dictionary, be careful you provide ReorderSam with correct dictionary file.

ADD COMMENTlink written 8.5 years ago by Rok190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1067 users visited in the last hour