problem to aligning FASTQ through BWA aligner
2
0
Entering edit mode
3.2 years ago
modarzi ▴ 170

Dear All, I need to setup BWA for alignment. I followed below steps:

1-downlod reference genome version through 'wget' and below link:

Wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz

2-then extract the downloaded file:

tar zvfx chromFa.tar.gz

3- 'cat' .fa file to wg.fa :

cat *.fa > wg.fa

4- remove additional files:

rm chr*.fa

5- Now, I have to generate an index file through the below command by bwa version 0.7.17-r1188:

./bwa index -p hg19bwaidx -a bwtsw wg.fa

After that in the folder I have these 5 files:
hg19bwaidx.amb, hg19bwaidx.ann, hg19bwaidx.bwt, hg19bwaidx.pac,hg19bwaidx.sa

6- Now, I would like to generate a SAM file for a paired-end through mem algorithm based on the below command:

./bwa mem  -T 19  ?????  file_paired_1.fastq  file_paired_2.fastq  > aln.sam

My question is that which one of the five files generated in the indexing process(step 5) should I use instead of ????? at step 6?

I appreciate it if anybody shares his/her comment with me.

Best Regards,

genome BWA alignment • 993 views
ADD COMMENT
1
Entering edit mode
3- 'cat' .fa file to wg.fa :

cat *.fa > wg.fa

if you want to use GATK in your downstream analysis you should care about the order of the chromosomes. GATK raises an error if the chromosomes are "chr1" "chr10" "chr11" ... instead of "chr1" "chr2" "chr3" ....

ADD REPLY
0
Entering edit mode

Thank you for your comment. honestly, I need a BWA-generated SAM file as input for finding CircularRNA in the CircularRNA finder. up to know, I couldn't find any CircularRNA in my samples. could you please guide me on what should I do instead of using the below command:

cat *.fa > wg.fa
ADD REPLY
1
Entering edit mode
3.2 years ago
ATpoint 82k
./bwa mem  -T 19  hg19bwaidx  file_paired_1.fastq  file_paired_2.fastq  > aln.sam

Just the basename of the index of the file.

Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

Pro tip, be safe and do:

find -maxdepth 1 -name "*.fa" | xargs cat > wg.fa

Otherwise it might happen that the shell will interpret the generated output file as an additional input file and will append the output to itself.

ADD COMMENT
0
Entering edit mode

thank you for your comment. Now, which file indicates reference genome? wg.fa or hg19bwaidx?

ADD REPLY
1
Entering edit mode
3.2 years ago

Just use the prefix hg19bwaidx

ADD COMMENT

Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6