Question

CALLING SNPS afther aligmnet with BBMap

0

Entering edit mode

7.8 years ago

cristina_sabiers ▴ 110

Well... My god have to fix a lot things before Im able to call my snps

I got my bam file directly from fastq using BBMap (thanks to genomax)

Of course my bam file is not sorted and no index, so I do...

samtools sort out.bam sorted.bam
samtools index sorted.bam sorted.bai

Then I continue at next step

java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -I sorted.bam  -o sorted.intervals

and I get this error message

Error details: SAM file doesn't have any read groups defined in the header.  The GATK no longer supports SAM files without read groups

and I use this code to fix it

java -jar /home/cri/Desktop/picard1/AddOrReplaceReadGroups.jar I=sorted.bam O=header.bam RGLB=LIBRARY RGPL="Ion Torrent" RGPU=RUN RGSM=SAMPLE RGCN=BCM

Then I try again

java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -I header.bam  -o header.intervals

Then I get another error (i jump to error to error... ^" )

 ERROR MESSAGE: Invalid command line: Cannot process the provided BAM/CRAM file(s) because they were not indexed.  The GATK does offer limited processing of unindexed BAM/CRAMs in --unsafe mode, but this feature is unsupported -- use it at your own risk!

I realize my bai file was called before sorted.bai so I rename to header.bai...didn't work either, then I index again my header.bam

samtools index header.bam header.bai

and Identify target regions for realignment finally works....

why I need to do twice the index?

bbmap SNP • 3.7k views

ADD COMMENT • link 7.8 years ago by cristina_sabiers ▴ 110

2

Entering edit mode

GATK is the only program that requires read groups and there is no getting around that requirement. Next time add the readgroups before sorting and indexing.

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

Thanks to all for the tips :) (I put in my notes)

My pc works now on step: Local realignment around indels :))))

java -Xmx4g -Djava.io.tmpdir=/tmp \

-jar GenomeAnalysisTK.jar \

-I input.marked.bam \

-R hg19.fa \

-T IndelRealigner \

-targetIntervals input.bam.list \

-o input.marked.realigned.bam

seems is going to take looong time.... my god need lot patient to do this kind of job...or maybe a better pc :P

ADD REPLY • link updated 7.8 years ago by GenoMax 141k • written 7.8 years ago by cristina_sabiers ▴ 110

1

Entering edit mode

Yes it will take a while. A few steps in GATK can be sped up by throwing more cores at it (and of course there is the trick of splitting up the analysis by chromosome after alignment and running the pipeline).

ADD REPLY • link 7.8 years ago by Chris Fields ★ 2.2k

0

Entering edit mode

Thanks Chris

ok..so I can split the jobb, smart...If the power electricity go out at home I will cry, I will see if find the code how to do it!

I have now 14h left! My pc just have 4cores amd, I invested most at ram (32gb) and I wait until they will pay me at work (this is Spain...) so I can look for a better CPU :)

ADD REPLY • link 7.8 years ago by cristina_sabiers ▴ 110

0

Entering edit mode

If they don't pay you the least they can do is give you access to a server/cluster at work? You really should be doing this type of work on a remote machine that is beefy and is on a UPS, if you have unreliable power at home.

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

well..Nearly three days took for do realignment around indels...

Now Im stock at this step

1) Count covariates:

java -Xmx4g -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar \
-l INFO \
-R hg19.fa \
--DBSNP dbsnp132.txt \
-I input.header.realigned.bam \
-T CountCovariates \
-cov ReadGroupCovariate \
-cov QualityScoreCovariate \
-cov CycleCovariate \
-cov DinucCovariate \
-recalFile input.recal_data.csv

I get this error

ERROR MESSAGE: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0 (use BaseRecalibrator instead; see documentation for usage)

So I check for next code:

 java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar \
   -T BaseRecalibrator \
   -R hg19.fa \
   -I input.header.realigned.bam \
   -knownSites latest_dbsnp.vcf \
   -o recal_data.table

ERROR MESSAGE: Could not read file /media/cri/CRIS_DATA/ULT/OTRO COPIA/otro/latest_dbsnp.vcf because file 'latest_dbsnp.vcf' does not exist

Im bit losted how to get that vcf file

location: ftp.broadinstitute.org username: gsapubftp-anonymous password: <blank>

I go to

ftp://ftp.broadinstitute.org/distribution/human_SNP_releases/ but dont see any vcf file to download

Thanks

ADD REPLY • link 7.8 years ago by cristina_sabiers ▴ 110

0

Entering edit mode

Which guide are you following for this exercise? Can you post a link?

Also use ADD REPLY when posting additional information like this. SUBMIT ANSWER should only be used for new answers for the original question.

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

I did wrong....sorry

http://seqanswers.com/wiki/How-to/exome_analysis

Thats the main pipeline I follow, but at the end many steps has been modified like use BBMap...

Now I down the file from here, dunno if its the right one

ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/

Thanks!

(I fixed one of the error message....I did mistake with copy and paste)

ADD REPLY • link 7.8 years ago by cristina_sabiers ▴ 110

score 1 · Answer 1 · 2016-07-02

1

Entering edit mode

7.8 years ago

Chris Fields ★ 2.2k

I agree with genomax. Also something to consider, many aligners have an option to do this when you initially run the alignment. bwa mem for instance has the -R option:

       -R STR        read group header line such as '@RG\tID:foo\tSM:bar' [null]

ADD COMMENT • link 7.8 years ago by Chris Fields ★ 2.2k

2

Entering edit mode

BBMap also has this option, with the "rgid=" flag for setting the readgroup ID, and similar flags for setting other readgroup fields. I don't think readgroups are used by many tools other than GATK, though. Maybe Picard.

Incidentally, BBMap can also generate a shellscript for you to produce a sorted, indexed bam file (assuming samtools is installed)... "bamscript=bamscript.sh" (or bs for short - "bs=bs.sh"). So you can run this command:

bbmap.sh stuff bs=bs.sh; sh bs.sh

...and when it completes there will be a sorted, indexed bam file + bai file. I added that because it's a common task and I find it hard to remember the syntax.

ADD REPLY • link 7.8 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks Brian

Next time I try with another fastq file I will put that commands on BBMap, for me all of this is new so I learn on a hard way ^^" I appreciate all the help.

ADD REPLY • link 7.8 years ago by cristina_sabiers ▴ 110