I was wondering if someone could help me? I am currently trying to do a step to check my ribosomal depletion step took during sequencing...using STAR Aligner. I had a plan to align the .fastq file to a ribosomal reference for homosapiens , and then write out a file that had all the unaligned reads (this would be the file I do my analysis on....it would not contain the rRNA contaminated reads)..then I planned to align that my Gh38 reference genome....however I keep getting an error and its not doing it...
Can someone help? I tried reading the manual and it is saying I must index first...I tried but I don't have a gtf file of my rRNA only Reference.
STAR --runThreadN 20 --genomeDir ~/directory/to my indexed/ribosomal/fasta/file --readFilesIn patient.fastq --outReadsUnmapped Fastx --genomeFastaFiles Hsapiens rRNA only reference file.fasta --sjdbOverhang max(ReadLength)-1
*Also does anyone know code to get it to give you only unmapped reads in a fastq file output?
Instead of STAR use bbduk.sh from BBmap
Why do you recommend? Is it easier? From my understanding, STAR is the best aligner
It is easier and do his job. But if you are planning to do differential expression then follow @grant.hovhannisyan suggestion
what downstream analysis are you planning to do after getting the non-rRNA bam file?
I am planning to generate counts to do differential expression and pathway analysis
then I recommend not bothering about removing the rRNA data during the mapping - just map all your data to the entire reference, do the counting (I guess you know the STAR also can do the counting) and then see how many counts for rRNA genes you have. In your DE analysis you simply can remove that genes before the analysis.