Why did I achieve shorter than initial reads subset after aligned reads extraction.
1
0
Entering edit mode
2.1 years ago

Hello dear colleages!

I have recently faced some problem. I have worked with long WGS reads. Firstly I have filtered the longest subset of reads, and aligned them to the custom sequence with several structural variants of the region of interest. After that I have extracted reads mapped to one of variants and detected some reads be much shorter than previously aligned in extracted subset. It is not clear for me what was wrong? May alignment params, or extraction procedure be wrong?

So here is commands I used:

<$INITIAL_READS bioawk -c fastx \
'length($seq) > 10000 \
{ print "@" $name "\n" $seq "\n+\n" $qual }' \
> $FILTERED_READS.fastq

minimap2 -a --secondary=no -L -2 --sam-hit-only -x map-ont -H \
$STRUCTURAL_VARIANTS $FILTERED_READS > lr_2_vars.sam

samtools faidx $STRUCTURAL_VARIANTS
samtools view -S -b lr_2_vars.sam | \
samtools sort > lr_2_vars.sorted.bam
samtools index lr_2_vars.sorted.bam

samtools view -t ${STRUCTURAL_VARIANTS}.fai \
-b lr_2_vars.sorted.bam VARIANT_OF_INTEREST \
> VARIANT.bam
samtools fastq -F 4 VARIANT.bam > EXTRACTED_READS.fastq

So I had this result:

<$EXTRACTED_READS bioawk -c fastx \
'{ print length($seq)}' | sort -V | head

52
54
55
56
56
56
56
62
73
77
minimap2 samtools alignment • 563 views
ADD COMMENT
0
Entering edit mode
2.1 years ago

You would need to look at the CIGAR string in the BAM file. Reads in a BAM file may be represented with soft or hard clips.

Hard clipped reads do not contain the unaligned regions only the aligned regions. Thus when converting back to fastq you lose those regions.

To be honest I have never seen a proper explanation for the reasons of why some alignments are represented with soft and others with hard clips. It probably has to do with the length of the clip and wether there are other alignments as well.

There is a flag in minimap2 that might help:

-Y           use soft clipping for supplementary alignments
ADD COMMENT
0
Entering edit mode

Thank you a lot Istvan Albert ! I had thought there be three options for partially aligned reads: 1) without clipping 2) soft clipping with lowcase masking 3) hard clipping with reads been truncated

I have to read more on specification of SAM format and to look on issues on minimap2's github.

ADD REPLY

Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6