Please advise a tutorial/course on genetic data analysis
1
0
Entering edit mode
2.5 years ago
Qbit- • 0

Hello everyone! I'm by no means a bioinformaticist, but would like to learn some art (my background is chemistry/computer science/machine learning, I do ML-supported drug design).

I would like to analyse human genetic data. Specifically, the task is as follows: given a pair of FASTQ files (produced by Illumina), I would like to get a list of mutations of each gene present in the data in the form GENE <position> A>C or as an rs number. The data contains cDNA reads.

So far I was able to run this pipeline to completion: https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/ However, I can not make any sense of the results. The pipeline produced some VCF files, but the SNPs seem to be not annotated with genes or at least I can not read it right :(. I used this reference genome: https://ftp.ensembl.org/pub/release-86/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz

Could you please advise an easy pipeline/tutorial/course to learn how to do a basic SNP analysis? Or please advise how to use the mentioned tools

fastq pipeline bwa • 873 views
ADD COMMENT
3
Entering edit mode
2.5 years ago

The sandbox.bio SNP alignment tutorial is very nice: https://sandbox.bio/tutorials?id=dna-secrets It uses the lambda phage reference genome, which is small enough to run on your computer. Learning with the much larger human genome takes longer but doesn't teach you that much more.

Looking at your tutorial - raw_snps.vcf will not contain any gene names, just the positions of SNPs in the reference genome. filtered_snps_final.ann.vcf should contain gene names but I expect the majority to be up- or downstream of genes, just like in 'real life'.

ADD COMMENT
0
Entering edit mode

Thanks for great links!

ADD REPLY

Login before adding your answer.

Traffic: 2510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6