Entering edit mode
10 months ago
amy__
▴
50
Hi,
I need the hg38 reference fasta file, does anyone know which download link it would be from this? https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/
Or if these are even the correct files?
Thanks, Amy
Hello,
Choose the 5th sequence from top.
Read the README file for your reference.
Thanks @sunnykev97, I did think it was that one! Thanks, Amy
Someone’s told me to use the GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
So I’m still unsure! I have read the readme but still not sure which for WES germline analysis
Oh wait they may be correct: The no_alt_analysis_set contains the sequences, in FASTA format, of the chromosomes, mitochondrial genome, unlocalized scaffolds, and unplaced scaffolds. The alternate locus scaffolds are omitted because many Next Generation Sequence read alignment pipelines are incompatible with the full assembly model
Well, Two types of genome assembly
If you like the post, upvote.
No, it’s not ‚good‘ as this information requires special alignment procedures that is not trivial and not implemented in most aligners. It even leads to false alignment results if using standard aligners because reads from these loci would come out as multimappers. For most applications use the one without ALT.
As ATpoint said, ALT information is tricky to deal with. This blog post elaborates nicely on this issue of choosing a good reference genome.
Thank you all, I appreciate the help!
So would you not recommend this tutorial as it is using GRCh38 with alternate contigs to map reads?