I think my RBP of interest is binding to PolyA tails. I have fastq.gz files from a CLIP-seq experiment and I want to see if any of my reads have high polyA content. I tried using Star Aligner (version 2.5.3a) to index a very short sequence of A's and was successful:
STAR --runMode genomeGenerate --runThreadN 16 --genomeDir PolyA/star_index --genomeFastaFiles PolyA/PolyA.fasta --genomeSAindexNbases 2 --limitGenomeGenerateRAM 33000000000
I then tried aligning my fastq.gz to the indexed polyA genome. The script ran for 24 hrs then aborted. When I aligned to the human genome, it was done in less than 20 minutes. The code I used is below:
STAR --runMode alignReads \ --runThreadN 16 \ --genomeDir ../genomes/PolyA/star_index \ --genomeLoad LoadAndRemove \ --readFilesIn pathtomyfile.fastq.gz \ --readFilesCommand zcat \ --outFilterMultimapNmax 20 \ --outFileNamePrefix myfileout.bam \ --outSAMattributes All \ --outSAMtype BAM Unsorted \ --outFilterMismatchNmax 10
Does anyone have suggestions at to why this failed? or have other recommendations to see how much polyA content I have in my samples?