Question: RNA-seq mapping Reference genome
0
gravatar for k.kathirvel93
2.3 years ago by
k.kathirvel93260
India
k.kathirvel93260 wrote:

Hi EveryOne,

I am using STAR and HISAT2 and two more tools RNA-seq data analysis. My question is, should I align my Draft genome with GRCh38 (.gtf) reference annotation or genome (.fa)reference? Which one is best?

rna-seq next-gen • 1.1k views
ADD COMMENTlink modified 2.3 years ago by Devon Ryan97k • written 2.3 years ago by k.kathirvel93260
1

Draft genome sequence and transcript level? Can you elaborate more about the situation? If you are talking about alignment in general, the gene transfer format does not include any sequence-related information, so for a draft genome sequence alignment, the ref genome is the right file. The GTF file is used by some tools as a guide to restrict the analysis to known transcripts only (reduces the time required for analysis).

ADD REPLYlink written 2.3 years ago by Arup Ghosh2.7k

Could you elaborate on the draft genome and the other genome (.fa)reference? Usually there will be only one genome, one gtf and one or more fastq files.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Jeffin Rockey1.1k

I am doing Transcriptome analysis. Draft genome - sequenced from patient sample. genome(.fa) - GRCh38 genome reference fasta file. Very clearly, STAR is taking .gtf and genome.fa file for reference mapping. But HISAT2 is taking only genome.fa as reference. Which one is correct. Thanks

ADD REPLYlink written 2.3 years ago by k.kathirvel93260

That is not true. Hisat2 needs a file of known splice junctions, which you have to generate from a reference GTF file. You could run Hisat2 without that file, but then it loses its splice-awareness, which is essential for meaningful alignment of RNA-seq reads that span exon-exon junctions. You can use both tools. My recommendation for you is the following: Look into the usage of both tools and choose the one that you feel more comfortable with. Both tools are well-accepted, tested and produce meaningful results. Much more important than the alignment is the downstream analysis, which you should focus on. What exactly is your final goal?

ADD REPLYlink written 2.3 years ago by ATpoint40k

My aim is differential gene expression analysis. I want to use both of the tools for comparison. Can i get the code for HIsat2 indexing with both genome(.fa) and annotation(.gtf) reference? Thanks.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by k.kathirvel93260

./hisat2-build will give you the information on the indexing.

./hisat2_extract_splice_sites.py extracts the splice sites from the GTF.

ADD REPLYlink written 2.3 years ago by ATpoint40k
1
gravatar for Devon Ryan
2.3 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

I'm not sure why you bothered making a draft genome of a patient, but you'd be better off not using it when analysing your RNAseq data. Just align to the reference genome and use one of the reference GTF files.

ADD COMMENTlink written 2.3 years ago by Devon Ryan97k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 844 users visited in the last hour