Question: STAR alignment results in large chunks of NNNNNs
0
gravatar for Marks
13 months ago by
Marks40
Marks40 wrote:

Hi Everyone,

I'm using STAR to align reads to a eukaryotic reference genome. I used gtf file and generated the index and then mapped. I'm inspecting the alignment in Tablet and very large chunks of missing data seem to he introduced. This is short read data, 100bp paired end stranded illumina sequencing.

Here is an example:

SRR1106690.72357245

From: 23,097 U23,097 to 25,155 U25,155

Length: 2,059 U2,059 (1969 mismatches)

Cigar: 74M1969N16M

Read direction is FORWARD

SRR1106690.72357245 GTGGGTGTTGGTGAGGGCAGGTAATGCCAGGTATGAACCGGCACCTGACA GGGCTGGTGTAGTCACTGTCACCCNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTTCCAT GCACTTCTC

Can anyone shed light on why STAR is producing alignments like these? The majority of aligned reads are like this. Inspecting the alignment further, it's as if STAR is treating the paired end data as if it was single end and combining the paired reads into one 200bp and then introducing NNNN's to fix the issue. Here are my commands that I used to index/map:

 STAR --runThreadN 8 --runMode genomeGenerate --genomeDir X --genomeFastaFiles Y.fna --sjdbGTFfile Z.gtf --sjdbOverhang 199
 STAR --runThreadN 12 --genomeDir X --readFilesIn Y_1.fastq.gz Y.fastq.gz --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --sjdbGTFfile A.gtf

Thanks

sequencing rna-seq alignment • 426 views
ADD COMMENTlink modified 13 months ago by Biostar ♦♦ 20 • written 13 months ago by Marks40
3
gravatar for Devon Ryan
13 months ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

The CIGAR string indicates that the Ns aren't actually there, rather it's just spliced.

ADD COMMENTlink written 13 months ago by Devon Ryan92k
1

As Devon explains it's a splicing. You can open your bam file in IGV and look at this read and you will see that it's spliced.

ADD REPLYlink written 13 months ago by Nicolas Rosewick8.3k

Hi Devon and Nicolas,

Thank you, that's very strange of tablet to show splicing as such. Thanks for the help guys, I should have looked at the CIGAR.

Thanks again

ADD REPLYlink written 13 months ago by Marks40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2074 users visited in the last hour