Question

hg19 or hg38 for variant calling

1

Entering edit mode

6.2 years ago

jsneaththompson ▴ 100

I've recently been troubleshooting an error in part of my variant calling pipeline, which has been traced back to me using bam files aligned to hg38 as input for an Agilent deduplication tool which has yet to migrate from hg19 to hg38. Currently my workaround is to align to hg19, deduplicate, then split the resultant sam back into fastqs and re-align to hg38, which seems convoluted.

Should I continue working with hg38 once I'm past this step, or should I stick with hg19 all the way? How do other people balance pipelines when some tools/datasets are in hg38 and others have yet to switch over from hg19? Any advice on this whole hg19 v. hg38 issue would be appreciated.

Edit: The tool is LocatIt, which is used for deduplication of reads by the molecular barcodes used in the HaloPlex HS Target Enrichment System. https://www.agilent.com/cs/library/software/Public/AGeNT%20ReadMe.pdf

genome variant calling Assembly • 2.5k views

ADD COMMENT • link updated 6.2 years ago by h.mon 35k • written 6.2 years ago by jsneaththompson ▴ 100

0

Entering edit mode

And you have to use this Agilent deduplication tool? There are alternatives, unless it's something specific you need.

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Second that. Look at clumpify: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.

ADD REPLY • link 6.2 years ago by GenoMax 141k

0

Entering edit mode

The tool is LocatIt, which is used for deduplication of reads by the molecular barcodes used in the HaloPlex HS Target Enrichment System. https://www.agilent.com/cs/library/software/Public/AGeNT%20ReadMe.pdf

ADD REPLY • link 6.2 years ago by jsneaththompson ▴ 100

0

Entering edit mode

What is the tool? Which kind of data? Is it the AgilentMBCDedup Tool, used to process the Molecular Barcode (MBC) of a HaloPlex runs?

ADD REPLY • link 6.2 years ago by h.mon 35k

0

Entering edit mode

The tool is LocatIt, which is used for deduplication of reads by the molecular barcodes used in the HaloPlex HS Target Enrichment System. https://www.agilent.com/cs/library/software/Public/AGeNT%20ReadMe.pdf

ADD REPLY • link 6.2 years ago by jsneaththompson ▴ 100

score 0 · Answer 1 · 2018-02-16

From the documentation you linked, LocatIt does not necessarily expects / uses hg19, it just expects the chomosome names will follow its conventional naming scheme. Maybe you have random / unplaced / alt chromosomes? Anyway, did you try to use the -H parameter?

-H SAM header file: By default, LocatIt expects hg19 names, chr1-chrM. If the contig names are different (for example, GRCh37 names or nonhuman), one can use this option and provide a SAM header file containing a dictionary of the contigs used by the data files, SAM/BAM and, optionally, the bed file.