Question: How to align RNA-seq data to a PolyA tail "genome"
18 months ago
mb31410 wrote:


I think my RBP of interest is binding to PolyA tails. I have fastq.gz files from a CLIP-seq experiment and I want to see if any of my reads have high polyA content. I tried using Star Aligner (version 2.5.3a) to index a very short sequence of A's and was successful:

STAR --runMode genomeGenerate --runThreadN 16 --genomeDir PolyA/star_index --genomeFastaFiles PolyA/PolyA.fasta --genomeSAindexNbases 2 --limitGenomeGenerateRAM 33000000000

I then tried aligning my fastq.gz to the indexed polyA genome. The script ran for 24 hrs then aborted. When I aligned to the human genome, it was done in less than 20 minutes. The code I used is below:

STAR --runMode alignReads \
--runThreadN 16 \
--genomeDir ../genomes/PolyA/star_index \
--genomeLoad LoadAndRemove \
--readFilesIn pathtomyfile.fastq.gz \
--readFilesCommand zcat \
--outFilterMultimapNmax 20 \
--outFileNamePrefix myfileout.bam \
--outSAMattributes All \
--outSAMtype BAM Unsorted \
--outFilterMismatchNmax 10

Does anyone have suggestions at to why this failed? or have other recommendations to see how much polyA content I have in my samples?

Thank you!

modified 18 months ago

I do not really see the point in doing that. Polyadenylation is a posttranscriptional modification, means special enzymes put the polyA tail to the pre-mRNA after transcription. The polyA is not part of the gene in the genome so alignment won't help you. I would rather use a dedicated trimming tool such as bbduk, trimmomatic or cutadapt to trim polyA tails (please use the search function), and then see how many % of the reads contained that pattern.

written 18 months ago
