Question: Confusing header after mapping w/ STAR using Ensembl refgenome
3.3 years ago
umn_bist320 wrote:

After a very (very) fast alignment with STAR (2-pass mode) for my RNA-seq, I viewed the header using samtools to make sure everything was sorted, aligned correctly. I used Gencode/Ensembl's lastest reference genome build (GrCh38.p5).

Here is my STAR input:

${STAR} --runMode alignReads --twopassMode Basic --runThreadN 24 --outSAMtype BAM SortedByCoordinate --outSAMattributes All --outFileNamePrefix "${file1%_1.fastq}_tsta" --outSAMmapqUnique 255 --sjdbGTFfile "${sjdb}" --genomeDir "${STAR_index}" --readFilesIn "${file7}" "${file8}"

Here is my 'samtools view -H' output:

@SQ    SN:chr15    LN:101991189
@SQ    SN:chr16    LN:90338345
@SQ    SN:chr17    LN:83257441
@SQ    SN:chr18    LN:80373285
@SQ    SN:chr19    LN:58617616
@SQ    SN:chr20    LN:64444167
@SQ    SN:chr21    LN:46709983
@SQ    SN:chr22    LN:50818468
@SQ    SN:chrX    LN:156040895
@SQ    SN:chrY    LN:57227415
@SQ    SN:chrM    LN:16569
I'm going to assume that this was done correctly and the notations represent (mostly) unplaced contigs, but I did not expect chr1-chr14 to be missing. I'm wondering if I can continue on using this bam file (after sorting, adding RG) for GATK variant calling workflow (snpEff, MuTect2).

I omitted some of the header to meet character limit.

if you samtools view -H | grep "chr14", does it return anything?

ADD REPLYlink written 3.3 years ago by h.mon25k
