Is there any aligner which supports soft clipping a fixed number of bases from reads prior to alignment?
Entering edit mode
7.6 years ago
Dan D 7.3k

I'm working on a project where the read data consists of 6-mers ligated to the 5' end of a sequence. These 6-mers are important, but must be excluded during the alignment of the reads to which they are attached.

My current method of evaluating these experimental data consists of trimming off the 6-mers, aligning the trimmed reads, and then using the read query-name to go back and retrieve the 6-mer from the original FASTQ data.

This method currently works just fine. However, it would be far more efficient if I could use an aligner with the ability to soft-clip (not trim!) the reads, such that the clipped bases from each read aren't considered during the alignment, but will show up in the BAM data with the appropriate CIGAR soft-clipping designation.

Does anyone know if there's any aligner out there which supports this option? I've searched through several (bwa, bowtie, SOAP), but unless I'm misunderstanding the documentation, none of them support what I'm trying to do.

alignment soft-clipping • 2.4k views
Entering edit mode
7.6 years ago
lh3 33k

I would write a small script to move the 6bp to the FASTQ header:

my ($name, $seq) = ('', '');
while (<>) {
    if ($.%4 == 1) {
        $name = @{split()}[0]; # save the read name
    } elsif ($.%4 == 2) {
        $seq = $_; # save the full sequence
    } elsif ($.%4 == 0) {
        # print the FASTQ header with the first 6bp in b6/q6 tags
        print "$name b6:Z:" . substr($seq, 0, 6), "\tq6:Z:" . substr($_, 0, 6) . "\n";
        # print sequence and quality with first 6bp trimmed
        print substr($seq, 6), "\n+\n";
        print substr($_, 6), "\n";

and then:

zcat myreads.fq.gz | perl | bwa mem -C ref.fa - | gzip -1 > out.sam.gz

The output will have "b6" and "q6" tags. In general, option "-C" copies the FASTQ comment to the SAM output. You can use this approach to attach any meta information about reads in the SAM output. Bwa-aln also has a barcode option.

Entering edit mode

Very elegant solution. Thank you!


Login before adding your answer.

Traffic: 877 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6