Selection of good quality SRA data for SNP analysis.
0
0
Entering edit mode
4.0 years ago

We are currently working on a project involving an SNP analysis on Mycobacterium tuberculosis genome and are working with GATK/SAMtools pipeline.

We have downloaded SRA data (fastq files). We plan to run a fastqc, but before that we wanted to know if there is a way to determine the quality of the file prior to that. For example: It was suggested to us that larger file size indicates better quality reads.

Are there any other flags that we can see on face value, prior to running a fastqc, that can indicate the quality of the SRA data?

SNP sequence • 749 views
ADD COMMENT
0
Entering edit mode

It was suggested to us that larger file size indicates better quality reads

There is no correlation between quality and data size. Larger files would mean larger number of reads but that is about it.

While you have not asked, do you have to use GATK for bacteria? It is going to make your analysis more difficult.

ADD REPLY
0
Entering edit mode

Thanks for that! Are there any alternatives to GATK that are more suitable to bacteria?

ADD REPLY
0
Entering edit mode

Did you tried snippy?

ADD REPLY
0
Entering edit mode

Agreed, and adding on this larger files can also simply mean longer reads without that quality is any different. You will always have to download and perform QC yourself in order to get an idea of th quality.

ADD REPLY

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6