Question: Vcftools Runs Of Homozygosity Option
2
gravatar for Rubal7
5.7 years ago by
Rubal7650
Rubal7650 wrote:

Hi all,

Does anybody understand how the --LROH option works in vcf tools? It is designed to detect long runs of homozygosity, which is exactly what I want to do, but without knowing more about the underlying assumptions it makes I am hesitant to use it. There is no clear explanation of how to interpret the output files. Any help or pointers greatly appreciated. Thank you in advance.

Cheers,

Rubal

vcf • 2.5k views
ADD COMMENTlink modified 11 months ago by afadda10 • written 5.7 years ago by Rubal7650
2
gravatar for Adam
5.7 years ago by
Adam940
United States
Adam940 wrote:

The LROH option implements the algorithm described in Auton et al., Genome Research, 2009, although using the Forward-backwards algorithm in place of the Viterbi. Unfortunately, the option in vcftools is very experimental at the moment, and isn't really ready for "prime time" (hence being labelled under "Options still in development"). Specifically, the option is very slow to run on any dataset of reasonable size, and I therefore don't recommend the option for regular use.

Regards, Adam

ADD COMMENTlink written 5.7 years ago by Adam940

thanks for this!

ADD REPLYlink written 5.7 years ago by Rubal7650

Thanks Adam. I have tested the LROH option on some non-human data and got some interesting results that I want to investigate further. Is there any reason to believe that running the LROH option would be problematic for mouse or rat genomes, for instance if it incorporates assumptions about recombination rate based on a human reference? Just want to check because it's not clear exactly how VCFTools implements the Auton et al., algorithm. Best, Rubal

ADD REPLYlink written 5.6 years ago by Rubal7650

what's your command line look like? I can't seem to get any output data using "vcftools --vcf sorted.vcf --LROH --chr chr1"

ADD REPLYlink written 4.7 years ago by mylons130

If I have interpreted the Auton et al. 2009 paper correctly, the algorithm will find ROH at least 1cM in length that have at least 50 SNPs.  Is the number of SNPs cumulative across individuals, such that 50 individuals could have a single (different) SNP in this region?  This is the only solution that comes to mind to explain my output file, which looks like this (truncated, of course):

CHROM   AUTO_START      AUTO_END        N_VARIANTS      INDV
chr6    3192774 7941907 85      M103
chr6    32714130        32714172        1       M103
chr6    32965687        33317372        9       M103
chr6    35146229        35975515        54      M103
chr6    18166887        18166904        8       M105r
chr6    18166908        18526735        19      M105
chr6    20216335        20216336        1       M113
chr6    20216340        20216341        1       M113
chr6    20216343        20216345        1       M113
chr6    20216350        20216356        1       M113

It's not clear to me why some of the regions (e.g., the bold line) are so small.

Thanks,
Loren

ADD REPLYlink written 2.7 years ago by sackettl0

The option in VCFtools just outputs all regions where the probability of autozygosity is about 0.99. It is left to the user to filter these events as required. 

I've actually made a few changes to the LROH option in the past couple of days, so the function should be a little more informative (and run slightly more efficiently). These changes are now available in the SVN version.

ADD REPLYlink written 2.7 years ago by Adam940
0
gravatar for afadda
11 months ago by
afadda10
Qatar
afadda10 wrote:

Hi, I'm running this command: vcftools --LROH --gzvcf file.vcf.gz --out file --chr 1 and getting this log on the screen: VCFtools - v0.1.13 (C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted: --gzvcf 0200092201_S3_L002.recalibrate_BOTH.vcf.gz --chr 1 --LROH --out /gpfs/projects/afadda/0200092201-chr1

Using zlib version: 1.2.3 Versions of zlib >= 1.2.4 will be much faster when reading zipped VCF files. After filtering, kept 1 out of 1 Individuals Outputting Long Runs of Homozygosity (Experimental)... 0200092201_S3_L002 After filtering, kept 50811 out of a possible 579645 Sites Run Time = 8.00 seconds

but the output file contains only the header. even if i do --stdout i also get the header only. CHROM AUTO_START AUTO_END MIN_START MAX_END N_VARIANTS_BETWEEN_MAX_BOUNDARIES N_MISMATCHES INDV

i don't know what's the problem.

thanks!

ADD COMMENTlink written 11 months ago by afadda10

Hi, I am having the same problem. After running the -LROH command I get ony the header line in output file. Can someone please explain on this.

Thank you Best Rangi

ADD REPLYlink written 22 days ago by sumudu_rangika30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 966 users visited in the last hour