Fasta.fai file error
2
0
Entering edit mode
2.6 years ago
stb1132 • 0

Hi, I have been struggling with an error in bedtools intersect. The command I am trying to run is as follows

bedtools intersect -a sorted.vcf -b nstd166.GRCh38.variant_call_chr.vcf.gz -wo -sorted  -f 0.8 -r  -g Homo_sapiens_assembly38.fasta.fai 

For some of the files that I am assessing, I don't get any errors and the output is obtained without issues. But sometimes the error I receive is as follows:

Error: The genome file Homo_sapiens_assembly38.fasta.fai has no valid entries. Exiting.

I have been looking for what could be the cause of the problem and I have seen that this is a quite common failure derived from the genome file structure, which in my case is the following:

chrI  15072421 101 112

While according to the bedtools documentation itself, the structure should be

chrI  15072421
chrII 15279323
...
chrX  17718854
chrM  13794

My question is, how is it possible that for some of the files I got an output but for some of them I get the error?

Thanks in advance!

bedtools • 2.2k views
ADD COMMENT
1
Entering edit mode

You are using a version of bedtools prior to 2.29. More recent versions have changes in the way the -g file is read and more detailed error messages, so I'd suggest you try the current version to shed some light on this.

ADD REPLY
0
Entering edit mode

Hi,

Please try:

-g Homo_sapiens_assembly38.fasta

Kevin

ADD REPLY
1
Entering edit mode

A bedtools genome file, as used with -g, is a tab-delimited table giving chromosome names and lengths, and the desired order of the chromosomes. Only the first two columns are used, so a .fai file is suitable. The FASTA file itself is not suitable.

ADD REPLY
0
Entering edit mode

Indeed, Sir, it is not expected a FASTA

ADD REPLY
1
Entering edit mode
2.6 years ago
ATpoint 81k

What is the output of head Homo_sapiens_assembly38.fasta.fai? It should work if this file is tab-delimited and has the same chromosome names as the VCFs.

ADD COMMENT
0
Entering edit mode

Thanks for all the replies, actually using -g Homo_sapiens_assembly38.fasta the output is the same error.

chr1    248956422   112 100 101
chr2    242193529   251446211   100 101
chr3    198295559   496061788   100 101
chr4    190214555   696340415   100 101
chr5    181538259   888457250   100 101
chr6    170805979   1071811004  100 101
chr7    159345973   1244325155  100 101
chr8    145138636   1405264700  100 101
chr9    138394717   1551854835  100 101
chr10   133797422   1691633613  100 101

The file genome file actually follows the format required.

ADD REPLY
0
Entering edit mode

There you see the problem, chr1 != chrI. You need a file where chromosome names match each other (with the VCFs).

Yes, the error is expected with using the fasta file, this is not the correct input. Try to find the fasta file that was used for generation of the VCFs (=the alignment) and generate the faindex from this one, or manually rename these (I guess roman) letters like I to 1, II to 2, V to 5 etc in the index file. Probably making a copy with cut -f1,2 before that makes sense.

ADD REPLY
0
Entering edit mode

Actually, all files have the same notation for the chromosome name, head -n 30 sorted.vcf

#CHROM  POS ID  ALT REF QUALT   FILTER  INFO    FORMAT
chr1    46551   SSC_DEL_1_2 <DEL>   .   .   .   END=97000;SUPP=NA;SUPP_VEC=NA;SVLEN=50449;SVTYPE=<DEL>;SVMETHOD=NA;CHR2=NA;CIPOS=NA;CIEND=NA;STRANDS=NA .
chr1    50937   SSC_DEL_1_3 <DEL>   .   .   .   END=51053;SUPP=NA;SUPP_VEC=NA;SVLEN=116;SVTYPE=<DEL>;SVMETHOD=NA;CHR2=NA;CIPOS=NA;CIEND=NA;STRANDS=NA   .
chr1    65851   SSC_DEL_1_4 <DEL>   .   .   .   END=93750;SUPP=NA;SUPP_VEC=NA;SVLEN=27899;SVTYPE=<DEL>;SVMETHOD=NA;CHR2=NA;CIPOS=NA;CIEND=NA;STRANDS=NA .

head -n 10 Homo_sapiens_assembly38.fasta.fai

   chr1  248956422  112 100 101
    chr2    242193529   251446211   100 101
    chr3    198295559   496061788   100 101
    chr4    190214555   696340415   100 101
ADD REPLY
0
Entering edit mode
2.6 years ago

Error: The genome file Homo_sapiens_assembly38.fasta.fai has no valid entries. Exiting.

How did you generate this Homo_sapiens_assembly38.fasta.fai file? Could it have been edited in a text editor afterwards?

This error message would be produced if the file's columns are separated with space characters rather than tabs.

ADD COMMENT

Login before adding your answer.

Traffic: 2754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6