Question: RNA-seq mapping Reference genome
0
gravatar for k.kathirvel93
10 months ago by
k.kathirvel93190
India
k.kathirvel93190 wrote:

Hi EveryOne,

I am using STAR and HISAT2 and two more tools RNA-seq data analysis. My question is, should I align my Draft genome with GRCh38 (.gtf) reference annotation or genome (.fa)reference? Which one is best?

rna-seq next-gen • 469 views
ADD COMMENTlink modified 10 months ago by Devon Ryan90k • written 10 months ago by k.kathirvel93190
1

Draft genome sequence and transcript level? Can you elaborate more about the situation? If you are talking about alignment in general, the gene transfer format does not include any sequence-related information, so for a draft genome sequence alignment, the ref genome is the right file. The GTF file is used by some tools as a guide to restrict the analysis to known transcripts only (reduces the time required for analysis).

ADD REPLYlink written 10 months ago by arup1.3k

Could you elaborate on the draft genome and the other genome (.fa)reference? Usually there will be only one genome, one gtf and one or more fastq files.

ADD REPLYlink modified 10 months ago • written 10 months ago by Jeffin Rockey1.1k

I am doing Transcriptome analysis. Draft genome - sequenced from patient sample. genome(.fa) - GRCh38 genome reference fasta file. Very clearly, STAR is taking .gtf and genome.fa file for reference mapping. But HISAT2 is taking only genome.fa as reference. Which one is correct. Thanks

ADD REPLYlink written 10 months ago by k.kathirvel93190

That is not true. Hisat2 needs a file of known splice junctions, which you have to generate from a reference GTF file. You could run Hisat2 without that file, but then it loses its splice-awareness, which is essential for meaningful alignment of RNA-seq reads that span exon-exon junctions. You can use both tools. My recommendation for you is the following: Look into the usage of both tools and choose the one that you feel more comfortable with. Both tools are well-accepted, tested and produce meaningful results. Much more important than the alignment is the downstream analysis, which you should focus on. What exactly is your final goal?

ADD REPLYlink written 10 months ago by ATpoint16k

My aim is differential gene expression analysis. I want to use both of the tools for comparison. Can i get the code for HIsat2 indexing with both genome(.fa) and annotation(.gtf) reference? Thanks.

ADD REPLYlink modified 10 months ago • written 10 months ago by k.kathirvel93190

./hisat2-build will give you the information on the indexing.

./hisat2_extract_splice_sites.py extracts the splice sites from the GTF.

ADD REPLYlink written 10 months ago by ATpoint16k
1
gravatar for Devon Ryan
10 months ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

I'm not sure why you bothered making a draft genome of a patient, but you'd be better off not using it when analysing your RNAseq data. Just align to the reference genome and use one of the reference GTF files.

ADD COMMENTlink written 10 months ago by Devon Ryan90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1699 users visited in the last hour