Read group info
1
1
Entering edit mode
2.7 years ago
priya.bmg ▴ 60

Hello

I need help in getting read group info for performing alignment using BWA-MEM2. I read previous post (bwa mem: Passing a variable to read group) on read-group info, where a shell script is used to get the read group info from fastq file. Can someone explain what details should be given in the shell script, it would be of great help?

Thanks

Priya

BWA bwa-mem2 • 2.5k views
ADD COMMENT
0
Entering edit mode

Making a vague reference to a previous post does not help you or us. Please provide a link for that post.

ADD REPLY
0
Entering edit mode

Sorry, have given the link above.

ADD REPLY
0
Entering edit mode

Since you are interested in running bwa-mem2 you will need to make the necessary changes inside the script to replace the command but otherwise you can use the answer bwa mem: Passing a variable to read group to run the script as shown. bwa-mapper.sh read_1.fq.gz read_2.fq.gz. Your read headers will need to follow the standard illumina format.

ADD REPLY
0
Entering edit mode

Thread continues: Read group info

ADD REPLY
1
Entering edit mode
2.7 years ago
Ishak ▴ 20
A=( $(ls $1/*1.fastq.qz && ls  $1/*1.fq.qz) ) #collect all forward fastq files

for i in "${!A[@]}"; 
 do 
 header=$(zcat ${A[i]} | head -n 1)   
 id=$(echo $header | head -n 1 | cut -f 1 -d":" | sed 's/@//'
 echo "@RG\tID:$id"

I hope it helps

ADD COMMENT
0
Entering edit mode

Hello

I have paired end sequences for 6 subjects. For each subject, read group information should be added in the bam file?. Read group info is different from subject to subject, right? If so, why combine all the forward fastq files as given in the above code. I am trying to understand the GATK pipeline for NGS analysis. Sorry for the silly question

ADD REPLY
1
Entering edit mode

You will get a bam file for each two paired fastq files. The read group should be same in both fastq files. Usually you need to

ID: id of sample SM:${A[i]//_1.@(fq|fastq).gz} sample name PL:illumina as example PU: platform unit CN: co. name

The information is allocated at the header of fastq file as shown in the first comment

As ID, you can make code to extract other variables and add to RG beside ID like that "@RG\tID:$id\tCN:GENOKS"

ADD REPLY

Login before adding your answer.

Traffic: 2408 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6