Hi everyone,
I have a question regarding a chip-seq pipeline I am adapting to my analysis.
As first, I sorted and indexed my BAM files, then I used the following script to remove duplicates and multi mappers.At the end, I obtained two folders, one with no duplicates with BAM and BAI files, one with no multi mappers with only BAM files.
I know that I will need BAI files in the next steps. Can I merge the two folders even if my files have different names?
Or can I create the BAI files using samtools or another tool?
Hope my explanation is not too confusing...
thank you in advance for the help
#!/bin/bash
bamtools="/home/anaconda3/bin/bamtools"
io="/media/jay/Data/ChipSeq_data"
mkdir -p $io/BAM/duplicates/
mkdir -p $io/BAM/multimappers/
cat > $io/filter.json <<- EOM
{
"isMapped" : "true",
"mapQuality" : ">4"
}
EOM
for i in `ls -1 $io/BAM/sorted/*.sorted.bam`
do
sample_name=`basename $i ".sorted.bam"`
echo "$i => $sample_name"
# mark and remove duplicates
duplicates="$io/BAM/duplicates/$sample_name.removed.dup.bam"
java -Xmx8G -jar /home/jay/picard/build/libs/picard-2.27.4-SNAPSHOT-all.jar MarkDuplicates \
INPUT=$i \
OUTPUT=$duplicates \
METRICS_FILE=$io/BAM/duplicates/$sample_name.metrics.txt \
OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500 \
CREATE_INDEX=true \
REMOVE_DUPLICATES=true
# remove multimappers after removing the duplicates
multimappers="$io/BAM/multimappers/$sample_name.filtered.bam"
$bamtools filter -in $duplicates -out $multimappers -script $io/filter.json
done
ok, thank you for the quick reply. I was not sure it was correct to recreate it at this step!
No worries, happy to help. And well, having "more than needed" index files never hurts in a way, they don't take as much space as the bam. But regarding if it is correct or not to generate them at "this step", depends on what are you going to do downstream. In a way is as simple as.. whether the next tool that you are going to input the bam file complains about needing a paired bai file or not :)
clear and simple! thanks again :)