Hi everyone
I would like to ask a question concerning, the use of FastP and gatk CollectSummaryAlignement.
I've used FastP for Preprocessing and I've specified the length requiered command to 30.
After using the gatk CollectSummaryAlignment command to have an idea about my aligne reads, it shows that the minimum length is about 19 - 20 bp.
After using the gatk CollectSummaryAlignment command to have an idea about my aligne reads, it shows that the minimum length is about 19 - 20 bp.
That is probably reflective of the part of those 30+ bp reads that actually aligned to the reference you are using. Remainder of the read must be "soft-clipped" since those bases did not align (which you can confirm by checking CIGAR string for those alignments or visually via a genome viewer). If you check the length of the reads going into this alignment they should all be 30+bp.
Look at your alignment file records (with samtools view and such) and check the 6th field (CIGAR strings are described on page 8 in SAM file format spec: https://samtools.github.io/hts-specs/SAMv1.pdf ). Soft-clipped alignment CIGAR strings will begin/end with S e.g. 15S10M.
How can I check the CIGAR string. And also do you have any idea about a genome browser I can use ?
Look at your alignment file records (with
samtools view
and such) and check the 6th field (CIGAR
strings are described on page 8 in SAM file format spec: https://samtools.github.io/hts-specs/SAMv1.pdf ). Soft-clipped alignment CIGAR strings will begin/end withS
e.g.15S10M
.Integrative Genomics Viewer (IGV): https://igv.org/doc/desktop/
Understood and thanks