Question

Query related to GATK

0

Entering edit mode

7.0 years ago

Bioinfonext ▴ 460

I have a whole genome resequencing Illumina reads from two contrasting genotypes.

I have few queries regarding GATK analysis.

Objective: I want to identify the homozygous SNP and Indels between these two genotypes by mapping raw read against the reference genome.

what are the prefiltering parameter need to take care before starting the GATK pipeline?

I already removed the adapter and low-quality bases from reads, do I need to remove repetitive reads also, if yes then please suggest how to do it? What are the other pre-read filtering parameter that also I should need to look?

In GATK pipeline why we are creating sequence dictionary? where is it used? What it the role of assign read group? how do I assign read group, does it has specific feature or just any random name I can put?

Create sequence dictionary

java -jar~/bin/picard-tools-1.8.5/CreateSequenceDictionary.jar REFERENCE=reference.fasta OUTPUT=reference.dict

Align reads and assign read group

bwa mem -R “@RG\tID:FLOWCELL1.LANE1\tPL:ILLUMINA\tLB:test\tSM:PA01” reference.fasta R1.fastq.gz R2.fastq.gz > aln.sam

SNP • 1.3k views

ADD COMMENT • link updated 7.0 years ago by Pierre Lindenbaum 161k • written 7.0 years ago by Bioinfonext ▴ 460

0

Entering edit mode

I have formatted your code correctly. In future use the icon shown below (after highlighting the text you want to format as code) when editing (Screenshot courtsey of @Wouter).

ScreenCap

ADD REPLY • link 7.0 years ago by GenoMax 141k

0

Entering edit mode

You are certainly boosting my citations, my supervisor will be pleased!

(but OP doesn't learn what we suggest about formatting so it's kinda useless)

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

0

Entering edit mode

In GATK pipeline why we are creating sequence dictionary? where is it used? What it the role of assign read group? how do I assign read group, does it has specific feature or just any random name I can put?

If you take the time to search biostars you will find answers for all those questions.

ADD REPLY • link 7.0 years ago by GenoMax 141k

score 2 · Accepted Answer · 2017-04-20

2

Entering edit mode

7.0 years ago

Pierre Lindenbaum 161k

In GATK pipeline why we are creating sequence dictionary?

short: because GATK needs it

long: because GATK needs to know the name, order, size of the contig in order to 'break' the analysis in parts or to check the compatibility between your bam/vcf and your ref

long+ : samtools faidx doesn't contain all the informations like organism, checksum, etc...

What it the role of assign read group?

to get the name of your sample in the VCF, to help markduplicate (!lane can't be optical dup), to link the sample to a pedigree etc...

how do I assign read group, does it has specific feature or just any random name I can put?

any random name. But you'll see it in 'RG:Z' in samtools view, so the best thing is to use something that contains your sample name...

ADD COMMENT • link 7.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

enter code hereI am not able to understand how to create a dictionary by using the latest version of Picard 2.8.1.

It is different from the previous version.

Should I use i for input instead of REFERENCE and what will be option for output?

Old PICARD 1.8.1 COMMAND:

jar /home/yog/software/picard-1.8.1/src/main/java/picard/sam/CreateSequenceDictionary.java 

REFERENCE=R1_R9.fasta OUTPUT=R1_R9.dict

New Command options:

jar /home/yog/software/picard-2.8.1/src/main/java/picard/sam/CreateSequenceDictionary.java

Illegal option: /

Usage: jar {ctxui}[vfmn0PMe] [jar-file] [manifest-file] [entry-point] [-C dir] files ... Options:

-c  create new archive

-t  list table of contents for archive

-x  extract named (or all) files from archive

-u  update existing archive

-v  generate verbose output on standard output

-f  specify archive file name

-m  include manifest information from specified manifest file

-e  specify application entry point for stand-alone application

    bundled into an executable jar file

-0  store only; use no ZIP compression

-P  preserve leading '/' (absolute path) and ".." (parent directory) components from file names

-M  do not create a manifest file for the entries

-i  generate index information for the specified jar files

-C  change to the specified directory and include the following file

If any file is a directory then it is processed recursively. The manifest file name, the archive file name and the entry point name are specified in the same order as the 'm', 'f' and 'e' flags.

ADD REPLY • link 7.0 years ago by Bioinfonext ▴ 460

0

Entering edit mode

Old PICARD 1.8.1 COMMAND:

jar /home/yog/software/picard-1.8.1/src/main/java/picard/sam/CreateSequenceDictionary.java

you're wrong. The old command was

java -jar  /path/to/CreateSequenceDictionary.jar ## <- .jar NOT .java java is a compiled language

the new command is

java -jar  /path/to/picard.jar CreateSequenceDictionary

ADD REPLY • link 7.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks a lot! Now it is working....

ADD REPLY • link 7.0 years ago by Bioinfonext ▴ 460

0

Entering edit mode

Hi Pierre,

one more thing I want to know ..... after running CreateSequenceDictionay.jar

before running BWA mem I need to run BWA index to create indexing file than where CreateSequenceDictionary generated file will be used.