Question: Will you critique/rate my mRNA seq alignment analysis?
gravatar for aswartz85
3.9 years ago by
aswartz8520 wrote:

What I'm trying to do is really straightforward - align ribosome profiling reads (only mRNA fragments protected from RNase degradation are sequenced) to the mouse transcriptome. I've completed this using the following:

I first ran my fastq file through FastQC. I noticed there was a lot of adapter contamination, so I ran my fastq through cutadapt/trim_galore. The output file appeared free of illumina adapters.

I then aligned to the transcriptome using Hisat2 w/ genome/transcriptome I downloaded from their website (GRCm38 genome_snp_tran files [I think this is what I want in order to align to the transcriptome?]). My command was as follows:

hisat2 -x [genome/transcriptome index] -U [single end read file].fastq -S [output file name].sam

samtools for SAM to sorted BAM conversion

Gene abundance using Cufflinks w/ command:

cufflinks -G [transcriptome annotation].gtf [input sorted bam]

Ultimately, the results looks okay. I did get a lot of unmapped reads (~30%) from Hisat2 alignment. This may have to do with the fact that the mice I'm using are not C57Bl6. Don't know if hisat2 genome/transcriptome build I'm using accounted for all snps.

Any suggestions?

rna-seq alignment • 1.2k views
ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 3.9 years ago by aswartz8520

That's a reasonable enough plan. I've done something similar but used STAR instead, which produces a bit nicer results since it can soft-clip the alignments. I'm not the worlds biggest fan of cufflinks, but with a well annotated mouse genome it's probably OK. I should note that if possible it's really nice to combine a standard mRNAseq sample or two with your ribosomal profiling, mostly because it can make it easier to determine which transcripts are really the ones getting expressed to begin with.

ADD REPLYlink written 3.9 years ago by Devon Ryan96k

I actually DO have the mRNA sequencing data for this experiment, and not just ribosome footprinting. I actually asked a question here on how I can compare the 2 given that my mRNA seq data is reported in RNA counts and ribosome footprinting is reported in FPKM (how cufflinks outputs data). But you made a great argument, that one is not better than the other, but that mRNA seq can just be used to corroborate RF data.

ADD REPLYlink written 3.9 years ago by aswartz8520

Sounds like a reasonable pipeline. That's indeed quite a lot of unmapped reads. Have you tried blasting some of them to find out where they are derived from?

ADD REPLYlink written 3.9 years ago by WouterDeCoster44k

hi, for the reads that didn't align, you can attempt de novo using Trinity. Works well in reasonable time. Though probably 30% is quite a chunk but if your model has some genetic modification: like maybe carrying an oncogene insert with Cre modification, or maybe has been CRISPR exposed to edit a particular locus. In such scenario the affected genes express transcript structures carrying the vector backbone (antibiotic selection markers, viral promoters etc.) which fails to align. If such a scenario is your case then Trinity or any other de novo assembler can salvage affected reads.

ADD REPLYlink written 3.9 years ago by Amitm1.9k

You may want to check the Mouse Genomes project for a reference genome for your specific strain if it's there. Might help with the unmapped reads.

ADD REPLYlink written 3.9 years ago by John12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1712 users visited in the last hour