Aligning using hg18
1
0
Entering edit mode
5.2 years ago
lamia_203 ▴ 100

Hi,

I'm trying to align reads from a study but they have aligned it to hg18. I managed to get the hg18.fa file in and want to align the samples to it. The code I am using is:

    for i in *.fq.gz; do STAR 
--./hg18.fa 
--runMode alignReads 
--readFilesIn $i 
--readFilesCommand zcat 
--outSAMtype BAM SortedByCoordinate 
--chimSegmentMin 20 
--quantMode GeneCounts 
--outReadsUnmapped Fastx 
--outFileNamePrefix ./read_out2/$i; done

However, this is failing as error messages are occuring such as

EXITING: FATAL INPUT ERROR: empty value for parameter "./hg18.fa" in input "Command-Line-Initial"
SOLUTION: use non-empty value for this parameter

I assume this is because it is not annotated and indexed. When looking online I am trying to find annotations but cannot find it clearly.

Many thanks.

alignment annotation hg18 RNA-Seq • 2.4k views
ADD COMMENT
4
Entering edit mode

I assume this is because it is not annotated and indexed.

It's not. It's because the parameter is being specified wrongly. I don't think STAR (or any unix tool for that matter) accepts an input file preceded by --. -- is the prefix for long-form options, such as --readFilesCommand, which accept a value such as zcat (or serve as a on/off flag). You're looking for the --genomeFastaFiles parameter, so --genomeFastaFiles ./hg18.fa

ADD REPLY
3
Entering edit mode

At this point there is really no reason to align data to hg18. That release is a decade old and has been superseded by two major releases.

If you must, here is the GFF file from NCBI for build 36.3. Create STAR genome index separately.

ADD REPLY
0
Entering edit mode

Many thanks! Yes, I tried aligning it to recent genome GRCh38 but the counts all appeared as 0 for genes and were all unmapped after QC. So I am trying to align the samples to the genome that the study have used. I am not sure why there were unmapped reads for GRCh38 as I had used it to previously aligned it successfully.

ADD REPLY
1
Entering edit mode

I tried aligning it to recent genome GRCh38 but the counts all appeared as 0 for genes

Genome builds have changed over time but human genome is still just that. You may get some differences in alignment percentages but getting zero counts indicates there was some problem with the annotation file you used and the genome reference you aligned to. (e.g. UCSC uses chr1 format where as NCBI/Ensembl use 1 etc.) Most likely cause of the zero counts is this sort of mismatch.

I suggest that you stop pursuing alignments to hg18 and diagnose why you are not getting counts with GRCh38 alignments (I assume the alignments themselves are fine or not?). Search past biostars posts for pointers. If you are still not able to sort the problem out then post a new question. Post details of alignment command, statistics, sources of reference genome and annotation (first few lines of annotation file).

ADD REPLY
0
Entering edit mode

The genome was used before and aligned fine and gave reads at expected genes for another study I carried out (beta cells), however I am also unsure as to why it didn't map the reads I provided (they are human as well). The parameters were kept the same as before as well:

for i in *.fq.gz; do STAR --human_index --runMode alignReads --readFilesIn $i --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --chimSegmentMin 20 --quantMode GeneCounts --outReadsUnmapped Fastx --outFileNamePrefix ./read_out2/$i; done

I

ADD REPLY
0
Entering edit mode

Take a few random reads from your data file and blast them at NCBI to make sure you have the right genome/dataset. If the genome index has worked before then something is up with the data. Switching to hg18 will not fix it.

ADD REPLY
0
Entering edit mode

Are you sure you are passing the index correctly to STAR? There is no --human_index argument, as there is no --./hg18.fa argument. You have to index the genome, the pass the directory where the index is located with --genomeDir /path/to/index.

ADD REPLY
1
Entering edit mode
5.2 years ago
h.mon 35k

You have to first build the index for the reference genome, with:

STAR --runThreadN NumberOfThreads
  --runMode genomeGenerate
  --genomeDir /path/to/genomeDir
  --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 ...
  --sjdbGTFfile /path/to/annotations.gtf
  --sjdbOverhang ReadLength-1

After you built the index, you can align the reads:

STAR --genomeDir /path/to/genomeDir
  --runMode alignReads
  --readFilesIn $i 
  --readFilesCommand zcat 
  --outSAMtype BAM SortedByCoordinate 
  --chimSegmentMin 20 
  --quantMode GeneCounts 
  --outReadsUnmapped Fastx 
  --outFileNamePrefix ./read_out2/$i

Where did you download the genome from? In general, the same site will have an accompanying annotation. But as genomax advised, better use a newer genome version.

ADD COMMENT
0
Entering edit mode

Thank you. I downloaded it from UCSC I believe, I couldn't find the gtf files clearly on there. This is my first time downloading genomes hence why I am unclear.

ADD REPLY
0
Entering edit mode

See Use genePredToGtf with a downloaded genePred table. And use a newer version of the human genome.

ADD REPLY

Login before adding your answer.

Traffic: 1928 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6