Question: Low-coverage whole-genome sequencing (LC-WGS) or ultra low-pass WGS analysis
gravatar for CuriusScientist
8 months ago by
Institute for Cancer Research, Dept. of Tumor Biology, Oslo University Hospital, Oslo, Norway
CuriusScientist50 wrote:

I am new to the genome analysis and want to analyse ultra low-pass WGS data for copy number variation analysis.

It will be really helpful if the members of the community can check if the approach I have taken is correct

# fetch information for read groups
header=$(zcat ${sample_name}_R1_001.fastq.gz | head -n 1)
instrument_Id=$(echo $sn | cut -f 1 -d":")
run_number=$(echo $sn | cut -f 2 -d":")
flowcell_Id=$(echo $sn | cut -f 3 -d":")
lane_number=$(echo $sn | cut -f 4 -d":")
tile_number=$(echo $sn | cut -f 5 -d":")
read_number=$(echo $sn | cut -f 2 -d"/")
echo "Cores = $(( $(lscpu | awk '/^Socket\(s\)/{ print $2 }') * $(lscpu | awk '/^Core\(s\) per socket/{ print $4 }') ))"
echo $Cores

# Mapping using BWA-mem
bwa mem -t $Cores -M -R '@RG\tID:'$ID'\tSM:'${sample_name}'\tPL:ILLUMINA\tLB:'$LB Homo_sapiens.GRCh38.dna.primary_assembly.fa ${sample_name}_R1_001.fastq.gz ${sample_name}_R2_001.fastq.gz >  ${sample_name}.sam

# Because BWA can sometimes leave unusual FLAG information on SAM records, it is helpful when working with many tools to first clean up read pairing information and flags
samtools fixmate ${sample_name}.sam ${sample_name}.fixmate.bam

# Keep only uniquely mapped read
# -F 780: include reads with none of the following tags , i.e. sam flag 780, reads unmapped, mate unmapped, not a primary alignemnet read fails platform/vendor quality checks
samtools view -b -q 0 -f 2 -F 780 -o ${sample_name}.fixmate.unique.bam ${sample_name}.fixmate.bam
# Keep only uniquely mapped reads and fliter the ones with quality score < 10 (-q 10)
samtools view -b -q 10 -f 2 -F 780 -o ${sample_name}.fixmate.unique.filtered.bam ${sample_name}.fixmate.bam

# sort mapped reads
samtool sort ${sample_name}.fixmate.bam -o ${sample_name}.fixmate.sorted.bam
samtool sort ${sample_name}.fixmate.unique.bam -o ${sample_name}.fixmate.sorted.unique.bam
samtool sort ${sample_name}.fixmate.unique.filtered.bam -o ${sample_name}.fixmate.sorted.unique.filtered.bam

# index mapped reads
samtools index ${sample_name}.fixmate.sorted.bam
samtools index ${sample_name}.fixmate.sorted.unique.bam
samtools index ${sample_name}.fixmate.sorted.unique.filtered.bam

# Remove duplicates
picard MarkDuplicates METRICS_FILE=${sample_name}.fixmate.sorted.unique.filtered.dedup.metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=STRICT INPUT=${sample_name}.fixmate.sorted.unique.filtered.bam OUTPUT= ${sample_name}.fixmate.sorted.unique.filtered.dedup.bam

picard SortSam.jar I=${sample_name}.fixmate.sorted.unique.filtered.dedup.bam O=${sample_name}.fixmate.sorted.unique.filtered.dedup.sorted.bam SO=coordinate

picard BuildBamIndex INPUT=${sample_name}.fixmate.sorted.unique.filtered.dedup.sorted.bam

What I want to know is this

  • have I missed any step
  • have I done some unnecessary
  • is the order of the steps and parameters correct?

I also want to clarify that after this, I am planning to use ichorCNA

  • GC-content bias correction (using HMMcopy) a. Computing read coverage from ULP-WGS b. Data correction and normalization
  • CNA prediction and estimation of tumor fraction of cfDNA
ADD COMMENTlink modified 7 months ago by Biostar ♦♦ 20 • written 8 months ago by CuriusScientist50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 812 users visited in the last hour