SNP analysis with an assembly
2
1
Entering edit mode
12 months ago
luzglongoria ▴ 50

Hi there,

I am new in SNP analyses so before starting doing anything I would like to check if my pipeline is correct. What I have now is : RNA-seq samples (.fq.gz) + Trinity assembly (from those reads).

My model organism has not an assembled genome, that's the way I need to use the Trinity assembly.

The step would be as follows:

1) The idea is to use this Trinity assembly as a reference for the SNP analyses. For the mapping process I would use Bowtie software since it is recommendable for RNA samples and (as far as I know) support RNA assemblies. I would get a .bam file as an output.

2) Then, I'd use the .bam files for the variant calling. In this step, I'm not sure which software to use: SAMtools, GATK, or FreeBayes

3) I have read that at this step is needed to filter the SNPs based on various criteria, such as read depth, mapping quality, and allele frequency, to remove potential false positives and low-quality variants. Not sure the software I need to use here. I'm mainly focused on allele frequency, (in case there is a specific software for these analyses).

4) I would like to perform too a population-level analysis with my several individuals (same individuals sampled at different time points). Is it correct to use tools like PLINK, VCFtools, or ADMIXTURE for these analyses?

Any help is more than welcome.

Thank you so much in advance.

SNP Trinity Bowtie • 1.1k views
ADD COMMENT
2
Entering edit mode
12 months ago
LChart 3.9k

You are using RNA-seq, and therefore are assembling the transcriptome; and you can use the resulting scaffolds as a reference for other RNA-seq data. As such, any variant detection you perform should be done in a manner consistent with best practices for variant detection from RNA-seq (https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels-). Note that the recommended aligner here ("STAR 2-PASS") is unavailable to you, as it requires both a genome and a transcriptome; so you can replace it with any aligner that takes a single fasta (I believe kallisto just uses transcript fastas).

Your settings may depend on the ploidy of your organism; and you should be very cognizant of genotype likelihoods. In fact, it is preferable to use a "likelihood-aware" or "dosage-aware" methodology for downstream population genetics.

ADD COMMENT
2
Entering edit mode
12 months ago
Vic ▴ 100

Once you have your .bam files you can use samtools to filter reads for a mapping quality score (for example >20), and then sort and index your bam files before placing your data through your chosen GATK pipeline.

examples of samtools commands are all over Biostars for example answers in here.

The GATK pipeline, consists of a number of steps and I believe they have an RNA variant detection pipeline. More specifically the SNP discovery steps usually consist of Haplotypecaller (this step takes in your bam files), GenomicsDBImport, GenotypeGVCFs (you might need gatherVCFs) and SelectVariants. The GATK website has a lot of really good documentation with examples.

Filtering wise, there is hard filtering step that you can use in GATK with recommended settings and then afterwards vcftools can be used for further filtering if required, for example for depth. ADMIXTURE is an analysis tool to examine population structure and I think takes in .ped .map or bed files. You can convert a vcf file to these using plink or vcftools.

ADD COMMENT
1
Entering edit mode

Additionally, the bioinformatics methods in this paper may be of use to you.

ADD REPLY
0
Entering edit mode

Thank you so much ! Very helpful :)

ADD REPLY

Login before adding your answer.

Traffic: 2728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6