Reconstructing reference Info of CHM13 reads
0
0
Entering edit mode
3.5 years ago
Martin • 0

Hello,

I am trying to reconstruct the reference information for chm13 HiFi reads. In more detail, given the fastq file of real human data and the reference file I want to construct a fasta file that contains the same reads but includes additional information about their position in the reference. My workflow looks like follows:

  1. I download the dataset, in my case it is 'SRR11292120_3_subreads.fastq.gz' from the CHM13 website for the reads, and the full assembly 'chm13.draft_v1.1.fasta' as the reference.

  2. I use Hifiasm https://github.com/chhylp123/hifiasm to error-correct the reads of the dataset with:

    ./hifiasm -o real_corrected -t 32 --write-paf --write-ec SRX5633451.fastq

  3. I use https://github.com/lh3/minimap2 (I also tested Winnowmap2, which leads to similar results)

    minimap2 -d ref.mmi chm13.draft_v1.1.fasta

    minimap2 -cx map-hifi ref.mmi real_corrected.ec.fa > alignment.paf

  4. I wrote a script to take the resulting paf file alignment.paf and the error-corrected fasta file and create a new fasta file with the annotated start, end and strand information from the resulting paf file of the minimap alignment. For this, I take for every read where minimap found one or more alignments, only the best alignment and annotate the respective reference position to the read.

  5. I check if the resulting alignment covers the whole chromosome. The result is that some chromosomes are completely covered, but some other chromosomes have between 1 and 4 gaps with a size of a few thousand base pairs.

What do you think about this workflow? Is there any way to improve the results, maybe with different parameters or different/ additional tools?

minimap2 winnowmap CHM13 alignment hifiasm • 660 views
ADD COMMENT

Login before adding your answer.

Traffic: 4070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6