RNA-seq mapping Reference genome
1
0
Entering edit mode
6.3 years ago
k.kathirvel93 ▴ 310

Hi EveryOne,

I am using STAR and HISAT2 and two more tools RNA-seq data analysis. My question is, should I align my Draft genome with GRCh38 (.gtf) reference annotation or genome (.fa)reference? Which one is best?

RNA-Seq next-gen rna-seq • 2.4k views
ADD COMMENT
1
Entering edit mode

Draft genome sequence and transcript level? Can you elaborate more about the situation? If you are talking about alignment in general, the gene transfer format does not include any sequence-related information, so for a draft genome sequence alignment, the ref genome is the right file. The GTF file is used by some tools as a guide to restrict the analysis to known transcripts only (reduces the time required for analysis).

ADD REPLY
0
Entering edit mode

Could you elaborate on the draft genome and the other genome (.fa)reference? Usually there will be only one genome, one gtf and one or more fastq files.

ADD REPLY
0
Entering edit mode

I am doing Transcriptome analysis. Draft genome - sequenced from patient sample. genome(.fa) - GRCh38 genome reference fasta file. Very clearly, STAR is taking .gtf and genome.fa file for reference mapping. But HISAT2 is taking only genome.fa as reference. Which one is correct. Thanks

ADD REPLY
1
Entering edit mode

That is not true. Hisat2 needs a file of known splice junctions, which you have to generate from a reference GTF file. You could run Hisat2 without that file, but then it loses its splice-awareness, which is essential for meaningful alignment of RNA-seq reads that span exon-exon junctions. You can use both tools. My recommendation for you is the following: Look into the usage of both tools and choose the one that you feel more comfortable with. Both tools are well-accepted, tested and produce meaningful results. Much more important than the alignment is the downstream analysis, which you should focus on. What exactly is your final goal?

ADD REPLY
0
Entering edit mode

My aim is differential gene expression analysis. I want to use both of the tools for comparison. Can i get the code for HIsat2 indexing with both genome(.fa) and annotation(.gtf) reference? Thanks.

ADD REPLY
0
Entering edit mode

./hisat2-build will give you the information on the indexing.

./hisat2_extract_splice_sites.py extracts the splice sites from the GTF.

ADD REPLY
1
Entering edit mode
6.3 years ago

I'm not sure why you bothered making a draft genome of a patient, but you'd be better off not using it when analysing your RNAseq data. Just align to the reference genome and use one of the reference GTF files.

ADD COMMENT

Login before adding your answer.

Traffic: 953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6