Question

Coverage drops in fastq alignment against custom Immunoglobulin reference

0

Entering edit mode

2.6 years ago

Gama313 ▴ 120

I am working on Hiseq2000/2500 single end reads on RNASeq leukemia samples. I am interested in aligning all the reads beloging to the Immunoglobulin genes (Ig) for further analysis. The task is difficult for two main reasons:

Final Ig genes are the result of a "collage" of genomic regions and alignment is difficult;
Ig are hypermutated, meaning that there are a lot of mutations interfering with the alignment;

I decided to develop my own pipeline since those available are inappropriate. I post a brief resume of the workflow:

Star alignment agains hg38;
Extraction of unmapped reads;
Extraction of reads mapping to the Ig locus;
De novo construction of the Ig with Trinity;
Alignment of de novo sequences with IgBlast;
Select the "correct" Ig;
Align sequences against the custom Ig reference with bowtie2 (--very-sensitive-local).

I've analyzed around 100 samples, 70% of those align correctly displaying a uniform coverage. The 30% of samples show a non-uniform coverage profile (See Fig. attached). As you can see, there is the common 3' bias with the coverage increasing at the end of the Ig. However, there is a huge drop in the middle of the Ig and I cannot understand whether is a kind of artifact lacking those reads or is the alignment parameters wrong.

enter image description here To resume, I cannot understand how is possible to observe different coverage profiles for samples belonging to the very same experiment. Any suggestion will be helpful

Alignment RNASeq Immunoglobulin • 422 views

ADD COMMENT • link 2.6 years ago by Gama313 ▴ 120