CALLING SNPS afther aligmnet with BBMap
1
0
Entering edit mode
6.6 years ago

Well... My god have to fix a lot things before Im able to call my snps

I got my bam file directly from fastq using BBMap (thanks to genomax)

Of course my bam file is not sorted and no index, so I do...

samtools sort out.bam sorted.bam
samtools index sorted.bam sorted.bai


Then I continue at next step

java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -I sorted.bam  -o sorted.intervals


and I get this error message

Error details: SAM file doesn't have any read groups defined in the header.  The GATK no longer supports SAM files without read groups


and I use this code to fix it

java -jar /home/cri/Desktop/picard1/AddOrReplaceReadGroups.jar I=sorted.bam O=header.bam RGLB=LIBRARY RGPL="Ion Torrent" RGPU=RUN RGSM=SAMPLE RGCN=BCM


Then I try again

java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -I header.bam  -o header.intervals


Then I get another error (i jump to error to error... ^" )

 ERROR MESSAGE: Invalid command line: Cannot process the provided BAM/CRAM file(s) because they were not indexed.  The GATK does offer limited processing of unindexed BAM/CRAMs in --unsafe mode, but this feature is unsupported -- use it at your own risk!


I realize my bai file was called before sorted.bai so I rename to header.bai...didn't work either, then I index again my header.bam

samtools index header.bam header.bai


and Identify target regions for realignment finally works....

why I need to do twice the index?

bbmap SNP • 3.2k views
2
Entering edit mode

GATK is the only program that requires read groups and there is no getting around that requirement. Next time add the readgroups before sorting and indexing.

0
Entering edit mode

Thanks to all for the tips :) (I put in my notes)

My pc works now on step: Local realignment around indels :))))

java -Xmx4g -Djava.io.tmpdir=/tmp \

-jar GenomeAnalysisTK.jar \

-I input.marked.bam \

-R hg19.fa \

-T IndelRealigner \

-targetIntervals input.bam.list \

-o input.marked.realigned.bam


seems is going to take looong time.... my god need lot patient to do this kind of job...or maybe a better pc :P

1
Entering edit mode

Yes it will take a while. A few steps in GATK can be sped up by throwing more cores at it (and of course there is the trick of splitting up the analysis by chromosome after alignment and running the pipeline).

0
Entering edit mode

Thanks Chris

ok..so I can split the jobb, smart...If the power electricity go out at home I will cry, I will see if find the code how to do it!

I have now 14h left! My pc just have 4cores amd, I invested most at ram (32gb) and I wait until they will pay me at work (this is Spain...) so I can look for a better CPU :)

0
Entering edit mode

If they don't pay you the least they can do is give you access to a server/cluster at work? You really should be doing this type of work on a remote machine that is beefy and is on a UPS, if you have unreliable power at home.

0
Entering edit mode

well..Nearly three days took for do realignment around indels...

Now Im stock at this step

1) Count covariates:

java -Xmx4g -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar \
-l INFO \
-R hg19.fa \
--DBSNP dbsnp132.txt \
-T CountCovariates \
-cov QualityScoreCovariate \
-cov CycleCovariate \
-cov DinucCovariate \
-recalFile input.recal_data.csv


I get this error

ERROR MESSAGE: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0 (use BaseRecalibrator instead; see documentation for usage)

So I check for next code:

 java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar \
-T BaseRecalibrator \
-R hg19.fa \
-knownSites latest_dbsnp.vcf \
-o recal_data.table


ERROR MESSAGE: Could not read file /media/cri/CRIS_DATA/ULT/OTRO COPIA/otro/latest_dbsnp.vcf because file 'latest_dbsnp.vcf' does not exist

Im bit losted how to get that vcf file

I go to

Thanks

0
Entering edit mode

Which guide are you following for this exercise? Can you post a link?

Also use ADD REPLY when posting additional information like this. SUBMIT ANSWER should only be used for new answers for the original question.

0
Entering edit mode

I did wrong....sorry

Thats the main pipeline I follow, but at the end many steps has been modified like use BBMap...

Now I down the file from here, dunno if its the right one

ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/

Thanks!

(I fixed one of the error message....I did mistake with copy and paste)

1
Entering edit mode
6.6 years ago
Chris Fields ★ 2.2k

I agree with genomax. Also something to consider, many aligners have an option to do this when you initially run the alignment. bwa mem for instance has the -R option:

       -R STR        read group header line such as '@RG\tID:foo\tSM:bar' [null]

2
Entering edit mode

BBMap also has this option, with the "rgid=" flag for setting the readgroup ID, and similar flags for setting other readgroup fields. I don't think readgroups are used by many tools other than GATK, though. Maybe Picard.

Incidentally, BBMap can also generate a shellscript for you to produce a sorted, indexed bam file (assuming samtools is installed)... "bamscript=bamscript.sh" (or bs for short - "bs=bs.sh"). So you can run this command:

bbmap.sh stuff bs=bs.sh; sh bs.sh

...and when it completes there will be a sorted, indexed bam file + bai file. I added that because it's a common task and I find it hard to remember the syntax.

0
Entering edit mode

Thanks Brian

Next time I try with another fastq file I will put that commands on BBMap, for me all of this is new so I learn on a hard way ^^" I appreciate all the help.