bowtie2 alignment has 'Duplicated sequence'
1
0
Entering edit mode
6.1 years ago
rmash ▴ 20

I built an index using the following fasta file:

INDEX FASTA (example):

>miRNA:mmu-mir-23b MI0000141 Mus musculus miR-23b stem-loop
GGCTGCTTGGGTTCCTGGCATGCTGATTTGTGACTTGAGATTAAAATCACATTGCCAGGG
ATTACCACGCAACC
>miRNA:mmu-mir-27b MI0000142 Mus musculus miR-27b stem-loop
AGGTGCAGAGCTTAGCTGATTGGTGAACAGTGATTGGTTTCCGCTTTGTTCACAGTGGCT
AAGTTCTGCACCT

I then took my FASTQ data (example):

@K00252:57:HGFMMBBXX:1:1101:4320:1209 1:N:0:AGTCAA
NCCCTGTAGATCCGAATTTGTG
+
#AAFFJJJJJJJJJJJJJJJJJ
@K00252:57:HGFMMBBXX:1:1101:5132:1209 1:N:0:AGTCAA
NAACGGAATCCCAAAAGCAGCTG
+
#AAFFJJJJJJJJJJJJJJJJJJ

Used bowtie2 to align but ended up with a bizarre sam file that said I had duplicated entires

OUTPUT SAM looks like this:

@HD     VN:1.0  SO:unsorted
@SQ     SN:CONTAMINATION:ADAPTER:adapters_contam1       LN:100
@SQ     SN:miRNA:mmu-let-7g     LN:88
@SQ     SN:miRNA:mmu-let-7i     LN:85
@SQ     SN:miRNA:mmu-mir-1a-1   LN:77
@SQ     SN:miRNA:mmu-mir-15b    LN:64
@SQ     SN:miRNA:mmu-mir-23b    LN:74
@SQ     SN:miRNA:mmu-mir-27b    LN:73
@SQ     SN:miRNA:mmu-mir-29b-1  LN:71
@SQ     SN:miRNA:mmu-mir-30a    LN:71
@SQ     SN:miRNA:mmu-mir-30b    LN:96

Any ideas what I might be doing wrong?

bowtie2 alignment SAM bowtie2-build • 3.5k views
ADD COMMENT
0
Entering edit mode

I wonder if this has to do with mapping small RNAs which are likely to map to multiple regions?

ADD REPLY
0
Entering edit mode

You should use bowtie v.1 if you are mapping small RNA's where you want to do ungapped alignments.

ADD REPLY
0
Entering edit mode
6.1 years ago
h.mon 35k

What you are seeing are the sam headers, not alignments. Looks normal top me.

that said I had duplicated entires

How exactly it said it? Do you mean bowtie summary at the end of alignment?

ADD COMMENT
0
Entering edit mode

Doesn't seem to have any of other other tags (e.g. HI, and I can't seem to sort the sam file)

ADD REPLY
0
Entering edit mode

Show the commands used and the error messages.

ADD REPLY
0
Entering edit mode

I am running a script collapse.py (input output) which has worked before

rmash$ python collapse.py filename.sam filename.fasta
[W::sam_hdr_parse] Duplicated sequence 'rRNA:12S_rRNA'
Traceback (most recent call last):
  File "collapse.py", line 25, in <module>
    entries = [entry for entry in pysam.AlignmentFile(args["INFNAME"],"rb") if dict(entry.get_tags())["HI"] == 1]# only get the first entry in case of multi-mappers
KeyError: 'HI'
ADD REPLY
0
Entering edit mode

Instead of going back and forth for a long time, why don't you describe everything you did - which programs (possibly with links to their sites), command lines, reference genome used, and so on.

I believe collapse.py (which I don't know from which pipeline is, as there are several collapse.py around) is complaing about duplicated sequences on the reference, not on your reads. Check with:

grep "rRNA:12S_rRNA" filename.fasta

And:

samtools view filename.fasta | grep "rRNA:12S_rRNA"
ADD REPLY
0
Entering edit mode

like this:

[W::sam_hdr_parse] Duplicated sequence 'rRNA:12S_rRNA'

and when you try to sort it lists lots as being duplicated

ADD REPLY

Login before adding your answer.

Traffic: 2673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6