Question: Low-coverage whole-genome sequencing (LC-WGS) or ultra low-pass WGS analysis
0
gravatar for CuriusScientist
11 months ago by
Institute for Cancer Research, Dept. of Tumor Biology, Oslo University Hospital, Oslo, Norway
CuriusScientist50 wrote:

I am new to the genome analysis and want to analyse ultra low-pass WGS data for copy number variation analysis.

It will be really helpful if the members of the community can check if the approach I have taken is correct

# fetch information for read groups
header=$(zcat ${sample_name}_R1_001.fastq.gz | head -n 1)
instrument_Id=$(echo $sn | cut -f 1 -d":")
run_number=$(echo $sn | cut -f 2 -d":")
flowcell_Id=$(echo $sn | cut -f 3 -d":")
lane_number=$(echo $sn | cut -f 4 -d":")
tile_number=$(echo $sn | cut -f 5 -d":")
read_number=$(echo $sn | cut -f 2 -d"/")
ID="$flowcell_Id"_"$lane_number"_"${sample_name}"
PL="ILLUMINA"
LB="${sample_name}"_"$lane_number"
echo "Cores = $(( $(lscpu | awk '/^Socket\(s\)/{ print $2 }') * $(lscpu | awk '/^Core\(s\) per socket/{ print $4 }') ))"
echo $Cores

# Mapping using BWA-mem
bwa mem -t $Cores -M -R '@RG\tID:'$ID'\tSM:'${sample_name}'\tPL:ILLUMINA\tLB:'$LB Homo_sapiens.GRCh38.dna.primary_assembly.fa ${sample_name}_R1_001.fastq.gz ${sample_name}_R2_001.fastq.gz >  ${sample_name}.sam

# Because BWA can sometimes leave unusual FLAG information on SAM records, it is helpful when working with many tools to first clean up read pairing information and flags
samtools fixmate ${sample_name}.sam ${sample_name}.fixmate.bam

# Keep only uniquely mapped read
# -F 780: include reads with none of the following tags , i.e. sam flag 780, reads unmapped, mate unmapped, not a primary alignemnet read fails platform/vendor quality checks
samtools view -b -q 0 -f 2 -F 780 -o ${sample_name}.fixmate.unique.bam ${sample_name}.fixmate.bam
# Keep only uniquely mapped reads and fliter the ones with quality score < 10 (-q 10)
samtools view -b -q 10 -f 2 -F 780 -o ${sample_name}.fixmate.unique.filtered.bam ${sample_name}.fixmate.bam

# sort mapped reads
samtool sort ${sample_name}.fixmate.bam -o ${sample_name}.fixmate.sorted.bam
samtool sort ${sample_name}.fixmate.unique.bam -o ${sample_name}.fixmate.sorted.unique.bam
samtool sort ${sample_name}.fixmate.unique.filtered.bam -o ${sample_name}.fixmate.sorted.unique.filtered.bam

# index mapped reads
samtools index ${sample_name}.fixmate.sorted.bam
samtools index ${sample_name}.fixmate.sorted.unique.bam
samtools index ${sample_name}.fixmate.sorted.unique.filtered.bam

# Remove duplicates
picard MarkDuplicates METRICS_FILE=${sample_name}.fixmate.sorted.unique.filtered.dedup.metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=STRICT INPUT=${sample_name}.fixmate.sorted.unique.filtered.bam OUTPUT= ${sample_name}.fixmate.sorted.unique.filtered.dedup.bam

picard SortSam.jar I=${sample_name}.fixmate.sorted.unique.filtered.dedup.bam O=${sample_name}.fixmate.sorted.unique.filtered.dedup.sorted.bam SO=coordinate

picard BuildBamIndex INPUT=${sample_name}.fixmate.sorted.unique.filtered.dedup.sorted.bam

What I want to know is this

  • have I missed any step
  • have I done some unnecessary
  • is the order of the steps and parameters correct?

I also want to clarify that after this, I am planning to use ichorCNA

  • GC-content bias correction (using HMMcopy) a. Computing read coverage from ULP-WGS b. Data correction and normalization
  • CNA prediction and estimation of tumor fraction of cfDNA
ADD COMMENTlink modified 10 months ago by Biostar ♦♦ 20 • written 11 months ago by CuriusScientist50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 869 users visited in the last hour