VCFTools MAF Filter needs Efficiency Inquiry
1
0
Entering edit mode
9 months ago
S • 0

I am performing a Minor Allele Frequency task on the cloud for my analysis.

vcftools --recode --recode-INFO-all --gzvcf /path/to/input.vcf --maf 0.01 --out output.maf.vcf > stdout.out

This process is taking exceedingly long (1 hour for a 30 GB Chr1 file) on a c5.4xlarge instance type. I thought about using threads or chunking or other subsetting data analysis techniques but have encountered trouble in the implementation. I read through the VCFTools documentation and could not find any threading / chunking that could be done within this method call.

Another approach I thought of was unzipping the GZipped file, then reading in only the genotypic information to a new vcf file and then filtering for MAF. This method does not seem like the most efficient manner to perform a MAF filter step.

Is there anything I am not considering while trying to speed up this process?

Thank you for your consideration

vcf maf vcftools • 450 views
ADD COMMENT
0
Entering edit mode
9 months ago

The main thing you aren't considering is that vcftools was mostly superseded by bcftools several years ago.

ADD COMMENT

Login before adding your answer.

Traffic: 2616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6