VG mapping paired-end reads: error [xg]: multiple hits for XXX
0
0
Entering edit mode
8 months ago
nkls063408 • 0

Hi everyone, I did create my first graph using vcf from HGDP as follow:

vg construct -r ref.fa -v sub-chrXX.vcf.gz > pXX.vg
vg ids -j $(for i in $(seq 1 22); do echo p${i}.vg; done)
vg index -x all.xg $(for i in $(seq 1 22); do echo p${i}.vg; done)
vg prune -t 8 -k 45 -r p${SLURM_ARRAY_TASK_ID}.vg > pruned${SLURM_ARRAY_TASK_ID}.vg
vg index -t 8 --temp-dir temp -g wg.gcsa pruned{1..22}.vg -p -Z 32768 2>&1

Those steps worked just fine. Then I tried to map reads using the following command:

vg map -x all.xg -g wg.gcsa -f HGDP00607_1.fastq.gz -f HGDP00607_2.fastq.gz > HGDP00607.gam

And I get the following error

error [xg]: multiple hits for $chrUn_JTFH01000458v1_decoy$

However when I map only one set of reads (only forward for example) the mapping step goes just fine

To create the reads I downloaded all the forward and reverse reads available on HGDP for this individual and concatenate them as follow:

cat *1.fastq.gz > concatenated_1.fastq.gz

I've been looking around and cannot find this error reported anywhere. Is there something wrong in my pipeline or is that an issue of the software ?

If I cannot find how to fix it, I was thinking to map forward and reverse reads separately and then merge the 2 gam file together (?). Or maybe should I merge both fastq files into one and map and map the merged file ?

Thank you for your help !

variation-graph reads pangenome vg paired-end • 1.4k views
ADD COMMENT
2
Entering edit mode

I believe this would happen because you have a duplicate sequence in your FASTA (namely the contig that the error indicates). It's kind of confusing that this error comes during mapping though. Ideally the error would happen early in the indexing process.

In any case, you should be able to resolve it by removing all but one duplicate during the index construction step.

You may also be interested in using vg autoindex --workflow map to automate the indexing pipeline, rather than doing all of the steps manually like this.

ADD REPLY
0
Entering edit mode

Hi Jordan thank you for your answer, I am still checking those fastq files but they are quite bigs so can't tell you yet what's wrong with them but I literaly only download the fastq from there and concatenate them (12 fastq forwards and 12 reverse). Will let you know if I find something there.

To remove duplicate while indexing should I use the option -o ? And would that be for both xg and gcsa ?

Otherwise I could run vg autoindex --workflow map pruned{1..22}.vg but before I wanted to know if using the solution i suggested earlier (merge both GAM after mapping or merge fastq before mapping) would trigger any issue in further steps (surjecting to bam files + calling variants) ? With my ressources at my lab, it took us almost 2 months to run the gcsa (ran for 2 weeks in total but regular HPC outage) so I'd love if I could not re-run that part :/

Thanks again

ADD REPLY
0
Entering edit mode

I don't think there's any error in the FASTQ. The duplicate contig would be in the FASTA reference. I don't think there's any built-in functionality to de-duplicate sequences in the reference in vg, so you'd have to handle that as a preprocessing step. As a quick check to make sure this is in fact the issue, you could compare grep ">" ref.fa | wc -l and grep ">" ref.fa | sort | uniq | wc -l.

If I'm right about this issue, then unfortunately yes, you would have to re-index. GCSA2 construction tends to be particularly slow on systems with slow disk I/O, so if you can arrange for a machine with faster drives, you might be able to speed it up.

The way to use autoindex for this pipeline would be:

vg autoindex --workflow map -r ref.fasta $(for c in $(seq 1 22; echo X; echo Y); do echo -v sub-chr${c}.vcf.gz; done)

However, if you have issues with frequent hardware failures, then the manual pipeline might be better. vg autoindex isn't restartable if the hardware fails mid-run.

ADD REPLY
0
Entering edit mode

Hi Jordan, So turnes out I don't have any issue with the reference fasta sequence, no redundant contigs. The mapping step took a long time but completed without any trouble. I decided to map forward and reverse reads separetly and concatenate GAM using "cat" as suggested here for later steps.

I am now merging, creating pack and calling variant. I am still unsure where that previous error came from but so far it is not causing any more issue. Will keep you updated about this behaviour. Thanks again !!

ADD REPLY
0
Entering edit mode

Hi Jordan M Eizenga ,

I finally run different steps :

1- Surject GAM to BAM

2- Call variant with HaplotypleCaller on the BAM

3- Run vg pack on the gam file

And those were fine. However, I do have an error while I ran vg call xg.xg -k concat.gam.pack > HGDP00607_gam.vcf

The error message I get is kinda weird as there is multiple lines with the following message:

Crash report for vg

Crash report for vg

Crash report for vg------------------------------------------------------------- v1.48.0-59-g276aa51e3 "Gallipoli"-------------------------------------------------- v1.48.0-59-g276aa51e3 "Gallipoli" -----------------------------------------v1.48.0-59-g276aa51e3 "Gallipoli" --------------------------------------------------------------

At the end of the file, I saw a segmentation fault error:

0x9 Object "", at 0, in

0x7 Object "", at 0, in

0x6 Object "", at 0xffffffffffffffff, in

0x5 Object "", at 0x2063592, in

0x4 Object "", at ", at 0xffffffffffffffff

0x3 Object "", at 0x1f94ce9, in

0x2 Object "", at 0x1f975b7, in

0x1 Object "", at 0x1f8ec7b/cm/local/apps/slurm/var/spool/job2396054/slurm_script: line

25: 54161 Segmentation fault (core dumped) vg call -t ${SLURM_CPUS_PER_TASK} xg.xg -k concat.gam.pack > HGDP00607_gam.vcf

Do you believe that is due to the error I shared with you in the first place ?

Thanks for your help !

ADD REPLY
0
Entering edit mode

That output doesn't give me a lot to work with, but IMO segmentation faults are always bugs, even if there's something wrong with the input. Probably the best thing to do is make a bug report on the VG GitHub. You might need to provide the data to reproduce this crash for us to do anything with it.

ADD REPLY
0
Entering edit mode

Thank you Jordan for your help. Did post the all thing on their github. Here is the post, https://github.com/vgteam/vg/issues/4071

Thanks !

ADD REPLY

Login before adding your answer.

Traffic: 2607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6