Question: What is ref sequence name in BAM file header ?
1
gravatar for newDNASeqer
4.5 years ago by
newDNASeqer620
United States
newDNASeqer620 wrote:

I am trying to use cn.mops to make CNV calls on my BWA-aligned BAM files, the getReadCountsFromBAM() function has an option for "seqRefName", and the explanation for this field is:

 

Name of the reference sequence that should be analyzed. The name must appear in the header of the BAM file. If it is not given the function will select the first reference sequence that appears in the header of the BAM files.

 

I used samtools view to view the header of my bam files, and here is what I see:

 

@HD     VN:1.4  SO:coordinate

@SQ     SN:chr1 LN:249250621

@SQ     SN:chr10        LN:135534747

@SQ     SN:chr11        LN:135006516

@SQ     SN:chr11_gl000202_random        LN:40103

@SQ     SN:chr12        LN:133851895

@SQ     SN:chr13        LN:115169878

@SQ     SN:chr14        LN:107349540

@SQ     SN:chr15        LN:102531392

@SQ     SN:chr16        LN:90354753

@SQ     SN:chr17        LN:81195210

@SQ     SN:chr17_ctg5_hap1      LN:1680828

@SQ     SN:chr17_gl000203_random        LN:37498

@SQ     SN:chr17_gl000204_random        LN:81310

@SQ     SN:chr17_gl000205_random        LN:174588

@SQ     SN:chr17_gl000206_random        LN:41001

@SQ     SN:chr18        LN:78077248

@SQ     SN:chr18_gl000207_random        LN:4262

@SQ     SN:chr19        LN:59128983

@SQ     SN:chr19_gl000208_random        LN:92689

@SQ     SN:chr19_gl000209_random        LN:159169

@SQ     SN:chr1_gl000191_random LN:106433

@SQ     SN:chr1_gl000192_random LN:547496

@SQ     SN:chr2 LN:243199373

@SQ     SN:chr20        LN:63025520

@SQ     SN:chr21        LN:48129895

@SQ     SN:chr21_gl000210_random        LN:27682

@SQ     SN:chr22        LN:51304566

@SQ     SN:chr3 LN:198022430

@SQ     SN:chr4 LN:191154276

@SQ     SN:chr4_ctg9_hap1       LN:590426

@SQ     SN:chr4_gl000193_random LN:189789

@SQ     SN:chr4_gl000194_random LN:191469

@SQ     SN:chr5 LN:180915260

@SQ     SN:chr6 LN:171115067

@SQ     SN:chr6_apd_hap1        LN:4622290

@SQ     SN:chr6_cox_hap2        LN:4795371

 

For the BWA alignment, I used hg19 as my reference, however, I do not see hg19 in the BAM headers, so what does it mean by seqRefName in cn.mop

One thing I've noticed is that getReadCountsFromBAM() seems using "chr1" as the default seqRefName when I don't specify it. So should I use a vector of chr(1..22) for seqRefName?

 

 

 

 

 

 

 

cnv cn.mops • 2.9k views
ADD COMMENTlink modified 4.5 years ago by Devon Ryan86k • written 4.5 years ago by newDNASeqer620
1

Yes, you need to use vector of all chr names if you would to use all chromosomes in the same time. Example is: refSeqName = c('@unitig_1066|quiver', '@unitig_1422|quiver', '@unitig_1441|quiver', '@unitig_223|quiver')

ADD REPLYlink written 3.7 years ago by Andrey Yurchenko10

Thanks. This was very helpful. I finally used 

fasta=readDNAStringSet("assembly.fasta") 

refSeqName=names(fasta)

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Biomonika (Noolean)3.0k

I have the same problem, but I don´t understand your solution.

I just tried to copy and past, but nothing happen...

Can someone explain me, what I have to do? pls!

ADD REPLYlink written 2.2 years ago by karim20

Though I have never used cn.mops but based on whatever you posted it seems that by "reference sequence" cn.mops means "chromosomes" or "contigs" that were part of the reference fasta file against which the reads were aligned. 

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Ashutosh Pandey11k
1
gravatar for Devon Ryan
4.5 years ago by
Devon Ryan86k
Freiburg, Germany
Devon Ryan86k wrote:

As you seem to have guessed, it's referring to the chromosome/contig names (i.e., what's in column 2 after SN:).

ADD COMMENTlink written 4.5 years ago by Devon Ryan86k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 783 users visited in the last hour