Question: How to align RNA-seq data to a PolyA tail "genome"
gravatar for mb314
3 months ago by
mb3140 wrote:


I think my RBP of interest is binding to PolyA tails. I have fastq.gz files from a CLIP-seq experiment and I want to see if any of my reads have high polyA content. I tried using Star Aligner (version 2.5.3a) to index a very short sequence of A's and was successful:

STAR --runMode genomeGenerate --runThreadN 16 --genomeDir PolyA/star_index --genomeFastaFiles PolyA/PolyA.fasta --genomeSAindexNbases 2 --limitGenomeGenerateRAM 33000000000

I then tried aligning my fastq.gz to the indexed polyA genome. The script ran for 24 hrs then aborted. When I aligned to the human genome, it was done in less than 20 minutes. The code I used is below:

STAR --runMode alignReads \
--runThreadN 16 \
--genomeDir ../genomes/PolyA/star_index \
--genomeLoad LoadAndRemove \
--readFilesIn pathtomyfile.fastq.gz \
--readFilesCommand zcat \
--outFilterMultimapNmax 20 \
--outFileNamePrefix myfileout.bam \
--outSAMattributes All \
--outSAMtype BAM Unsorted \
--outFilterMismatchNmax 10

Does anyone have suggestions at to why this failed? or have other recommendations to see how much polyA content I have in my samples?

Thank you!

ADD COMMENTlink modified 3 months ago by h.mon25k • written 3 months ago by mb3140

I do not really see the point in doing that. Polyadenylation is a posttranscriptional modification, means special enzymes put the polyA tail to the pre-mRNA after transcription. The polyA is not part of the gene in the genome so alignment won't help you. I would rather use a dedicated trimming tool such as bbduk, trimmomatic or cutadapt to trim polyA tails (please use the search function), and then see how many % of the reads contained that pattern.

ADD REPLYlink modified 3 months ago • written 3 months ago by ATpoint16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 884 users visited in the last hour