Issue with picard index in my one chr reference fasta file
0
0
Entering edit mode
7 months ago
Manuel ▴ 50

I have developed a NGS pipeline and I want a quick test that I would like to execute in git action. To do this, I want to subset the fasta genome reference I have to send to gitAction and do the analysis (sending the entire genome reference is impossible).

To do this, I subset the genome reference:

  samtools faidx /mnt/data1/db/gcp-public-data--broad-references/hg38/v0/GIABv1mask/Homo_sapiens_assembly38.GIABv1mask.fasta  chr1 > chr1.fasta

then I create all index files I need (I execute this script for my whole human genome in the past and now for the chr1 only, I assume this is OK for two references fasta).

 #!/bin/bash
set -euo pipefail  # Exit immediately on errors, unset variables, and failures in pipelines

FASTA_PATH="/SomaticPipeline/test2/get_reference_data_for_gitAction/chr1/chr1.fasta"
MASKED_DIR="/SomaticPipeline/test2/get_reference_data_for_gitAction/chr1/GIABv1mask"
EXCLUSIONS_BED="/mnt/data1/db/gcp-public-data--broad-references/hg38/v0/GIABv1mask/GCA_000001405.15_GRCh38_GRC_exclusions.bed"
MASKED_FASTA="${MASKED_DIR}/chr1.GIABv1mask.fasta"

# Step 1: Create subfolder for masked reference & copy exclusion BED file
echo "1 - Creating masked reference subfolder and copying exclusions BED..."
mkdir -p "$MASKED_DIR"
cp "$EXCLUSIONS_BED" "$MASKED_DIR"

# Step 2: Mask FASTA using BEDTools
echo "2 - Masking FASTA file..."
apptainer exec \
--containall \
--bind /mnt:/mnt,/home/dominm:/home/dominm \
docker://quay.io/biocontainers/bedtools:2.24--1 \
bedtools maskfasta \
-fi "$FASTA_PATH" \
-bed "$MASKED_DIR/$(basename "$EXCLUSIONS_BED")" \
-fo "$MASKED_FASTA"

# Step 3: Generate genome size file using masked FASTA
echo "3 - Generating genome size file..."
apptainer exec \
--containall \
--bind /mnt:/mnt,/home/dominm:/home/dominm \
docker://trinhanne/pisces:5.2.10 \
dotnet /app/CreateGenomeSizeFile_5.2.10.49/CreateGenomeSizeFile.dll \
-g "$MASKED_DIR" \
-s "GATK Bundle Homo_sapiens_assembly38.fasta masked with GIAB v1 BED" \
-o "$MASKED_DIR"

# Step 4: Index masked FASTA with BWA
echo "4 - Indexing masked FASTA with BWA..."
apptainer exec \
--containall \
--bind /mnt:/mnt,/home/dominm:/home/dominm \
docker://quay.io/biocontainers/bwa:0.7.18--he4a0461_0 \
bwa index -6 "$MASKED_FASTA"

echo " All steps completed successfully!"

So the previous script works fine (no error reported) when using the whole human genome as well as when using the chr1 subsample. However, the problem arises when executing my pipeline bwa shows this error:

 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Input contains paired reads but no SECOND_END_FASTQ specified.
        at picard.sam.SamToFastq.handleRecord(SamToFastq.java:316)
        at picard.sam.SamToFastq.doWork(SamToFastq.java:205)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:280)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:105)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:115)
[Sat Feb 08 21:33:51 GMT 2025] Executing as dominm@wglsbi01 on Linux 4.18.0-553.36.1.el8_10.x86_64 amd64; OpenJDK 64-Bit Server VM 21.0.1-internal-adhoc.conda.src; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: 3.1.1
[main] Version: 0.7.18-r1243-dirty
[main] CMD: /usr/local/bin/bwa mem -Y -L 50 -K 100000000 -v 3 /home/dominm/SomaticPipeline/test2/get_reference_data_for_gitAction/chr1/GIABv1mask/chr1.GIABv1mask.fasta /dev/stdin
[main] Real time: 3.112 sec; CPU: 0.119 sec
INFO    2025-02-08 21:33:51     SamAlignmentMerger      Processing SAM file(s): [/dev/stdin]
INFO    2025-02-08 21:33:52     AbstractAlignmentMerger Wrote 0 alignment records and 50010 unmapped reads.
[Sat Feb 08 21:33:52 GMT 2025] picard.sam.MergeBamAlignment done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=194641920

So If I run my pipeline with the whole genome reference, bwa works but if I use the chr1 subsample, I get this confusing error message.

SamToFastq • 505 views
ADD COMMENT
1
Entering edit mode

that is not a problem with bwa or with any indexing , it's a problem with a downstream program: picard/SamToFastq is invoked but an argument SECOND_END_FASTQ is missing. https://gatk.broadinstitute.org/hc/en-us/articles/360036485372-SamToFastq-Picard#--SECOND_END_FASTQ

ADD REPLY

Login before adding your answer.

Traffic: 4269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6