Question: STAR alignment results in large chunks of NNNNNs
0
gravatar for Mark
23 months ago by
Mark790
Mark790 wrote:

Hi Everyone,

I'm using STAR to align reads to a eukaryotic reference genome. I used gtf file and generated the index and then mapped. I'm inspecting the alignment in Tablet and very large chunks of missing data seem to he introduced. This is short read data, 100bp paired end stranded illumina sequencing.

Here is an example:

SRR1106690.72357245

From: 23,097 U23,097 to 25,155 U25,155

Length: 2,059 U2,059 (1969 mismatches)

Cigar: 74M1969N16M

Read direction is FORWARD

SRR1106690.72357245 GTGGGTGTTGGTGAGGGCAGGTAATGCCAGGTATGAACCGGCACCTGACA GGGCTGGTGTAGTCACTGTCACCCNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTTCCAT GCACTTCTC

Can anyone shed light on why STAR is producing alignments like these? The majority of aligned reads are like this. Inspecting the alignment further, it's as if STAR is treating the paired end data as if it was single end and combining the paired reads into one 200bp and then introducing NNNN's to fix the issue. Here are my commands that I used to index/map:

 STAR --runThreadN 8 --runMode genomeGenerate --genomeDir X --genomeFastaFiles Y.fna --sjdbGTFfile Z.gtf --sjdbOverhang 199
 STAR --runThreadN 12 --genomeDir X --readFilesIn Y_1.fastq.gz Y.fastq.gz --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --sjdbGTFfile A.gtf

Thanks

sequencing rna-seq alignment • 619 views
ADD COMMENTlink modified 23 months ago by Biostar ♦♦ 20 • written 23 months ago by Mark790
3
gravatar for Devon Ryan
23 months ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

The CIGAR string indicates that the Ns aren't actually there, rather it's just spliced.

ADD COMMENTlink written 23 months ago by Devon Ryan96k
1

As Devon explains it's a splicing. You can open your bam file in IGV and look at this read and you will see that it's spliced.

ADD REPLYlink written 23 months ago by Nicolas Rosewick9.0k

Hi Devon and Nicolas,

Thank you, that's very strange of tablet to show splicing as such. Thanks for the help guys, I should have looked at the CIGAR.

Thanks again

ADD REPLYlink written 23 months ago by Mark790
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 676 users visited in the last hour