Hello, I am new to ribo-seq analysis. Here is my working flow:
- Read quality control was performed with FastQC.
- The reads were trimmed with cutadapt v5.0, removing short reads (<20 nt) and adapter sequences.
- FastQC after adaptor trimming
- Deduplication was performed using a seqkit (v2.9.0).
- Read mapping was performed to the hg38 genome assembly with the GENCODE v29 basic genome annotation using STAR v2.7.6a
- The alignment results were stored in the Aligned.sortedByCoord.out.bam file.
I found tutorial for RibosomeProfilingQC and I analyzed the data downloaded from journal. After removing rRNA, tRNA, snRNA, snoRNA, misc_RNA, and RepeatMasker annotations, I obtained a file named fil.bam.
My questions are:
- Since my input BAM file is already mapped (to hg38), do I need to map the fil.bam file again to hg38 using STAR as the tutorial suggests?
- I used hg38 and GENCODE v29 GTF files downloaded from Ensembl for mapping with STAR. However, the tutorial uses UCSC references. Could this cause problems with the downstream analysis?
Additionally, I’ve been receiving warning messages in my downstream analysis, and the QC shows that the data quality is not good. Could this be related to the mapping or reference files used?
> estimatePsite(bamfile, CDS, genome)
[1] 13
Warning message:
In .merge_two_Seqinfo_objects(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': GL000008.2, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000208.1, GL000213.1, GL000214.1, GL000216.2, GL000218.1, GL000219.1, GL000220.1, GL000221.1, GL000224.1, GL000225.1, GL000226.1, KI270302.1, KI270303.1, KI270304.1, KI270305.1, KI270310.1, KI270311.1, KI270312.1, KI270315.1, KI270316.1, KI270317.1, KI270320.1, KI270322.1, KI270329.1, KI270330.1, KI270333.1, KI270334.1, KI270335.1, KI270336.1, KI270337.1, KI270338.1, KI270340.1, KI270362.1, KI270363.1, KI270364.1, KI270366.1, KI270371.1, KI270372.1, KI270373.1, KI270374.1, KI270375.1, KI270376.1, KI270378.1, KI270379.1, KI270381.1, KI270382.1, KI270383.1, KI270384.1, KI270385.1, KI270386.1, KI270387.1, KI270388.1, KI270389.1, KI270390.1, KI270391.1, KI270392.1, KI270393.1, KI270394.1, KI270395.1, KI270396.1, KI270411.1, KI270412.1, KI270414.1, KI270417.1, KI270418.1, KI270419.1, KI270420.1, KI270422.1, KI270423.1, KI270424.1, KI270425.1, KI27042 [... truncated]
Thanks for helping out.
Hello, Jack. Thanks for taking the time to answer my question. I believe my dataset should be valid, as I downloaded it directly from journal. Yesterday, I reran my STAR program with the parameter
--outFilterMultimapNmax 1
, and by reviewing the mapping results, I believe the mapping is overall good.However, I discovered that the error was caused by the BAM file. When I examined the header of the BAM file, I noticed that, instead of normal characters, I found unusual entries that match the names mentioned in the warning message.
I ultimately use samtools to remove all the chromosomes that start with GL and KI, and only kept the conventional chromosomes. I do find some posts talking about this, but do you think filtering them all out is a good choice?
After filering out, there are no warning messages.
my code of building index and mapping
Hi Tommy, It will depend on the analysis you are trying to do but if you are comparing individual genes across conditions or looking for novel translated regions then there should be no issue with what you did in my opinion.
Thanks, Jack, for your help! :)