Question: CALLING SNPS afther aligmnet with BBMap
0
gravatar for cristina_sabiers
2.7 years ago by
Spain
cristina_sabiers40 wrote:

Well... My god have to fix a lot things before Im able to call my snps

I got my bam file directly from fastq using BBMap (thanks to genomax)

Of course my bam file is not sorted and no index, so I do...

samtools sort out.bam sorted.bam
samtools index sorted.bam sorted.bai

Then I continue at next step

java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -I sorted.bam  -o sorted.intervals

and I get this error message

Error details: SAM file doesn't have any read groups defined in the header.  The GATK no longer supports SAM files without read groups

and I use this code to fix it

java -jar /home/cri/Desktop/picard1/AddOrReplaceReadGroups.jar I=sorted.bam O=header.bam RGLB=LIBRARY RGPL="Ion Torrent" RGPU=RUN RGSM=SAMPLE RGCN=BCM

Then I try again

java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -I header.bam  -o header.intervals

Then I get another error (i jump to error to error... ^" )

 ERROR MESSAGE: Invalid command line: Cannot process the provided BAM/CRAM file(s) because they were not indexed.  The GATK does offer limited processing of unindexed BAM/CRAMs in --unsafe mode, but this feature is unsupported -- use it at your own risk!

I realize my bai file was called before sorted.bai so I rename to header.bai...didn't work either, then I index again my header.bam

samtools index header.bam header.bai

and Identify target regions for realignment finally works....

why I need to do twice the index?

snp bbmap • 1.6k views
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by cristina_sabiers40
2

GATK is the only program that requires read groups and there is no getting around that requirement. Next time add the readgroups before sorting and indexing.

ADD REPLYlink written 2.7 years ago by genomax64k

Thanks to all for the tips :) (I put in my notes)

My pc works now on step: Local realignment around indels :))))

java -Xmx4g -Djava.io.tmpdir=/tmp \

-jar GenomeAnalysisTK.jar \

-I input.marked.bam \

-R hg19.fa \

-T IndelRealigner \

-targetIntervals input.bam.list \

-o input.marked.realigned.bam

seems is going to take looong time.... my god need lot patient to do this kind of job...or maybe a better pc :P

ADD REPLYlink modified 2.7 years ago by genomax64k • written 2.7 years ago by cristina_sabiers40
1

Yes it will take a while. A few steps in GATK can be sped up by throwing more cores at it (and of course there is the trick of splitting up the analysis by chromosome after alignment and running the pipeline).

ADD REPLYlink written 2.7 years ago by Chris Fields2.1k

Thanks Chris

ok..so I can split the jobb, smart...If the power electricity go out at home I will cry, I will see if find the code how to do it!

I have now 14h left! My pc just have 4cores amd, I invested most at ram (32gb) and I wait until they will pay me at work (this is Spain...) so I can look for a better CPU :)

ADD REPLYlink written 2.7 years ago by cristina_sabiers40

If they don't pay you the least they can do is give you access to a server/cluster at work? You really should be doing this type of work on a remote machine that is beefy and is on a UPS, if you have unreliable power at home.

ADD REPLYlink written 2.7 years ago by genomax64k

well..Nearly three days took for do realignment around indels...

Now Im stock at this step

1) Count covariates:

java -Xmx4g -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar \
-l INFO \
-R hg19.fa \
--DBSNP dbsnp132.txt \
-I input.header.realigned.bam \
-T CountCovariates \
-cov ReadGroupCovariate \
-cov QualityScoreCovariate \
-cov CycleCovariate \
-cov DinucCovariate \
-recalFile input.recal_data.csv

I get this error

ERROR MESSAGE: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0 (use BaseRecalibrator instead; see documentation for usage)

So I check for next code:

 java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar \
   -T BaseRecalibrator \
   -R hg19.fa \
   -I input.header.realigned.bam \
   -knownSites latest_dbsnp.vcf \
   -o recal_data.table

ERROR MESSAGE: Could not read file /media/cri/CRIS_DATA/ULT/OTRO COPIA/otro/latest_dbsnp.vcf because file 'latest_dbsnp.vcf' does not exist

Im bit losted how to get that vcf file

location: ftp.broadinstitute.org username: gsapubftp-anonymous password: <blank>

I go to

ftp://ftp.broadinstitute.org/distribution/human_SNP_releases/ but dont see any vcf file to download

Thanks

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by cristina_sabiers40

Which guide are you following for this exercise? Can you post a link?

Also use ADD REPLY when posting additional information like this. SUBMIT ANSWER should only be used for new answers for the original question.

ADD REPLYlink written 2.7 years ago by genomax64k

I did wrong....sorry

http://seqanswers.com/wiki/How-to/exome_analysis

Thats the main pipeline I follow, but at the end many steps has been modified like use BBMap...

Now I down the file from here, dunno if its the right one

ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/

Thanks!

(I fixed one of the error message....I did mistake with copy and paste)

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by cristina_sabiers40
1
gravatar for Chris Fields
2.7 years ago by
Chris Fields2.1k
University of Illinois Urbana-Champaign
Chris Fields2.1k wrote:

I agree with genomax. Also something to consider, many aligners have an option to do this when you initially run the alignment. bwa mem for instance has the -R option:

       -R STR        read group header line such as '@RG\tID:foo\tSM:bar' [null]
ADD COMMENTlink written 2.7 years ago by Chris Fields2.1k
2

BBMap also has this option, with the "rgid=" flag for setting the readgroup ID, and similar flags for setting other readgroup fields. I don't think readgroups are used by many tools other than GATK, though. Maybe Picard.

Incidentally, BBMap can also generate a shellscript for you to produce a sorted, indexed bam file (assuming samtools is installed)... "bamscript=bamscript.sh" (or bs for short - "bs=bs.sh"). So you can run this command:

bbmap.sh stuff bs=bs.sh; sh bs.sh

...and when it completes there will be a sorted, indexed bam file + bai file. I added that because it's a common task and I find it hard to remember the syntax.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Brian Bushnell16k

Thanks Brian

Next time I try with another fastq file I will put that commands on BBMap, for me all of this is new so I learn on a hard way ^^" I appreciate all the help.

ADD REPLYlink written 2.7 years ago by cristina_sabiers40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 766 users visited in the last hour