I'm trying to align reads from a study but they have aligned it to hg18. I managed to get the hg18.fa file in and want to align the samples to it. The code I am using is:
for i in *.fq.gz;do STAR
--./hg18.fa
--runMode alignReads
--readFilesIn $i
--readFilesCommand zcat
--outSAMtype BAM SortedByCoordinate
--chimSegmentMin 20
--quantMode GeneCounts
--outReadsUnmapped Fastx
--outFileNamePrefix ./read_out2/$i;done
However, this is failing as error messages are occuring such as
EXITING: FATAL INPUT ERROR: empty value for parameter "./hg18.fa"in input "Command-Line-Initial"
SOLUTION: use non-empty value for this parameter
I assume this is because it is not annotated and indexed. When looking online I am trying to find annotations but cannot find it clearly.
I assume this is because it is not annotated and indexed.
It's not. It's because the parameter is being specified wrongly. I don't think STAR (or any unix tool for that matter) accepts an input file preceded by --. -- is the prefix for long-form options, such as --readFilesCommand, which accept a value such as zcat (or serve as a on/off flag). You're looking for the --genomeFastaFiles parameter, so --genomeFastaFiles ./hg18.fa
Many thanks! Yes, I tried aligning it to recent genome GRCh38 but the counts all appeared as 0 for genes and were all unmapped after QC. So I am trying to align the samples to the genome that the study have used. I am not sure why there were unmapped reads for GRCh38 as I had used it to previously aligned it successfully.
I tried aligning it to recent genome GRCh38 but the counts all
appeared as 0 for genes
Genome builds have changed over time but human genome is still just that. You may get some differences in alignment percentages but getting zero counts indicates there was some problem with the annotation file you used and the genome reference you aligned to. (e.g. UCSC uses chr1 format where as NCBI/Ensembl use 1 etc.) Most likely cause of the zero counts is this sort of mismatch.
I suggest that you stop pursuing alignments to hg18 and diagnose why you are not getting counts with GRCh38 alignments (I assume the alignments themselves are fine or not?). Search past biostars posts for pointers. If you are still not able to sort the problem out then post a new question. Post details of alignment command, statistics, sources of reference genome and annotation (first few lines of annotation file).
The genome was used before and aligned fine and gave reads at expected genes for another study I carried out (beta cells), however I am also unsure as to why it didn't map the reads I provided (they are human as well). The parameters were kept the same as before as well:
for i in *.fq.gz;do STAR --human_index --runMode alignReads --readFilesIn $i --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --chimSegmentMin 20 --quantMode GeneCounts --outReadsUnmapped Fastx --outFileNamePrefix ./read_out2/$i;done
Take a few random reads from your data file and blast them at NCBI to make sure you have the right genome/dataset. If the genome index has worked before then something is up with the data. Switching to hg18 will not fix it.
Are you sure you are passing the index correctly to STAR? There is no --human_index argument, as there is no --./hg18.fa argument. You have to index the genome, the pass the directory where the index is located with --genomeDir /path/to/index.
Where did you download the genome from? In general, the same site will have an accompanying annotation. But as genomax advised, better use a newer genome version.
Thank you. I downloaded it from UCSC I believe, I couldn't find the gtf files clearly on there. This is my first time downloading genomes hence why I am unclear.
It's not. It's because the parameter is being specified wrongly. I don't think STAR (or any unix tool for that matter) accepts an input file preceded by
--
.--
is the prefix for long-form options, such as--readFilesCommand
, which accept a value such aszcat
(or serve as a on/off flag). You're looking for the--genomeFastaFiles
parameter, so--genomeFastaFiles ./hg18.fa
At this point there is really no reason to align data to hg18. That release is a decade old and has been superseded by two major releases.
If you must, here is the GFF file from NCBI for build 36.3. Create STAR genome index separately.
Many thanks! Yes, I tried aligning it to recent genome GRCh38 but the counts all appeared as 0 for genes and were all unmapped after QC. So I am trying to align the samples to the genome that the study have used. I am not sure why there were unmapped reads for GRCh38 as I had used it to previously aligned it successfully.
Genome builds have changed over time but human genome is still just that. You may get some differences in alignment percentages but getting zero counts indicates there was some problem with the annotation file you used and the genome reference you aligned to. (e.g. UCSC uses
chr1
format where as NCBI/Ensembl use1
etc.) Most likely cause of the zero counts is this sort of mismatch.I suggest that you stop pursuing alignments to
hg18
and diagnose why you are not getting counts with GRCh38 alignments (I assume the alignments themselves are fine or not?). Search past biostars posts for pointers. If you are still not able to sort the problem out then post a new question. Post details of alignment command, statistics, sources of reference genome and annotation (first few lines of annotation file).The genome was used before and aligned fine and gave reads at expected genes for another study I carried out (beta cells), however I am also unsure as to why it didn't map the reads I provided (they are human as well). The parameters were kept the same as before as well:
I
Take a few random reads from your data file and blast them at NCBI to make sure you have the right genome/dataset. If the genome index has worked before then something is up with the data. Switching to
hg18
will not fix it.Are you sure you are passing the index correctly to STAR? There is no
--human_index
argument, as there is no--./hg18.fa
argument. You have to index the genome, the pass the directory where the index is located with--genomeDir /path/to/index
.