Question: Aligning using hg18
0
gravatar for lamia_203
7 months ago by
lamia_20330
lamia_20330 wrote:

Hi,

I'm trying to align reads from a study but they have aligned it to hg18. I managed to get the hg18.fa file in and want to align the samples to it. The code I am using is:

    for i in *.fq.gz; do STAR 
--./hg18.fa 
--runMode alignReads 
--readFilesIn $i 
--readFilesCommand zcat 
--outSAMtype BAM SortedByCoordinate 
--chimSegmentMin 20 
--quantMode GeneCounts 
--outReadsUnmapped Fastx 
--outFileNamePrefix ./read_out2/$i; done

However, this is failing as error messages are occuring such as

EXITING: FATAL INPUT ERROR: empty value for parameter "./hg18.fa" in input "Command-Line-Initial"
SOLUTION: use non-empty value for this parameter

I assume this is because it is not annotated and indexed. When looking online I am trying to find annotations but cannot find it clearly.

Many thanks.

ADD COMMENTlink modified 7 months ago by h.mon27k • written 7 months ago by lamia_20330
4

I assume this is because it is not annotated and indexed.

It's not. It's because the parameter is being specified wrongly. I don't think STAR (or any unix tool for that matter) accepts an input file preceded by --. -- is the prefix for long-form options, such as --readFilesCommand, which accept a value such as zcat (or serve as a on/off flag). You're looking for the --genomeFastaFiles parameter, so --genomeFastaFiles ./hg18.fa

ADD REPLYlink written 7 months ago by RamRS24k
3

At this point there is really no reason to align data to hg18. That release is a decade old and has been superseded by two major releases.

If you must, here is the GFF file from NCBI for build 36.3. Create STAR genome index separately.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax71k

Many thanks! Yes, I tried aligning it to recent genome GRCh38 but the counts all appeared as 0 for genes and were all unmapped after QC. So I am trying to align the samples to the genome that the study have used. I am not sure why there were unmapped reads for GRCh38 as I had used it to previously aligned it successfully.

ADD REPLYlink written 7 months ago by lamia_20330
1

I tried aligning it to recent genome GRCh38 but the counts all appeared as 0 for genes

Genome builds have changed over time but human genome is still just that. You may get some differences in alignment percentages but getting zero counts indicates there was some problem with the annotation file you used and the genome reference you aligned to. (e.g. UCSC uses chr1 format where as NCBI/Ensembl use 1 etc.) Most likely cause of the zero counts is this sort of mismatch.

I suggest that you stop pursuing alignments to hg18 and diagnose why you are not getting counts with GRCh38 alignments (I assume the alignments themselves are fine or not?). Search past biostars posts for pointers. If you are still not able to sort the problem out then post a new question. Post details of alignment command, statistics, sources of reference genome and annotation (first few lines of annotation file).

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax71k

The genome was used before and aligned fine and gave reads at expected genes for another study I carried out (beta cells), however I am also unsure as to why it didn't map the reads I provided (they are human as well). The parameters were kept the same as before as well:

for i in *.fq.gz; do STAR --human_index --runMode alignReads --readFilesIn $i --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --chimSegmentMin 20 --quantMode GeneCounts --outReadsUnmapped Fastx --outFileNamePrefix ./read_out2/$i; done

I

ADD REPLYlink written 7 months ago by lamia_20330

Take a few random reads from your data file and blast them at NCBI to make sure you have the right genome/dataset. If the genome index has worked before then something is up with the data. Switching to hg18 will not fix it.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax71k

Are you sure you are passing the index correctly to STAR? There is no --human_index argument, as there is no --./hg18.fa argument. You have to index the genome, the pass the directory where the index is located with --genomeDir /path/to/index.

ADD REPLYlink written 7 months ago by h.mon27k
1
gravatar for h.mon
7 months ago by
h.mon27k
Brazil
h.mon27k wrote:

You have to first build the index for the reference genome, with:

STAR --runThreadN NumberOfThreads
  --runMode genomeGenerate
  --genomeDir /path/to/genomeDir
  --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 ...
  --sjdbGTFfile /path/to/annotations.gtf
  --sjdbOverhang ReadLength-1

After you built the index, you can align the reads:

STAR --genomeDir /path/to/genomeDir
  --runMode alignReads
  --readFilesIn $i 
  --readFilesCommand zcat 
  --outSAMtype BAM SortedByCoordinate 
  --chimSegmentMin 20 
  --quantMode GeneCounts 
  --outReadsUnmapped Fastx 
  --outFileNamePrefix ./read_out2/$i

Where did you download the genome from? In general, the same site will have an accompanying annotation. But as genomax advised, better use a newer genome version.

ADD COMMENTlink written 7 months ago by h.mon27k

Thank you. I downloaded it from UCSC I believe, I couldn't find the gtf files clearly on there. This is my first time downloading genomes hence why I am unclear.

ADD REPLYlink modified 7 months ago by RamRS24k • written 7 months ago by lamia_20330

See Use genePredToGtf with a downloaded genePred table. And use a newer version of the human genome.

ADD REPLYlink written 7 months ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 598 users visited in the last hour