How to convert output from bowtie1 to Genomic ranges
1
0
Entering edit mode
6.7 years ago

Hello, I have output from bowtie 1

bowtie  '~/UCSC/hg38/Sequence/BowtieIndex/genome'  -f '~/sample.fa'  
 -v 1  -k 1 -p 6 --al outputAligned --un outputNotAligned > randomtest1

that it looks like that:

    fread("randomtest1")

     V1      V2 V3    V4        V5                                     V6                                     V7 V8 V9
     1: t00000001 1148630  +  chrX  69672575               GGAATACCGGGTGCTGTAGGCTTT               IIIIIIIIIIIIIIIIIIIIIIII  1 NA
     2: t00000002 1078059  - chr11  64891202                  GGCTGTCAATTCATAGGTCAG                  IIIIIIIIIIIIIIIIIIIII  0 NA
     3: t00000003 1038720  + chr19  53787930                AAAGTGCTGCGACATTTGAGCGT                IIIIIIIIIIIIIIIIIIIIIII  0 NA
     4: t00000004 1027976  + chr13  91351360                 TATTGCACTTGTCCCGGCCTGT                 IIIIIIIIIIIIIIIIIIIIII  0 NA
     5: t00000005  948116  - chr17   1713913                 ACAGTTCTTCAACTGGCAGCTT                 IIIIIIIIIIIIIIIIIIIIII  0 NA
    ---                                                                                                                         
611991: t01137877       1  +  chr6  21215555                   TCCTTTCAGCTCTACAACTC                   IIIIIIIIIIIIIIIIIIII  0 NA
611992: t01137878       1  + chr12 110012992       ATCCTGGTAAGGTTGGACTGCAGCCCTTTTCT       IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0 NA
611993: t01137880       1  +  chrM     11393 TCCACTTATGACTCCCTAAAGCCCATGTCGAAGCCCCC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0 NA
611994: t01137885       1  +  chr7 133383687      TGTCGACTTTGATATTTTGGTTACCTGAGGATT      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0 NA
611995: t01137887       1  - chr20  62414571      AGCCGAATGACCATGGCTGTGTGGAGCCGTGAC      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0 NA

so the file is not a sam / bam or bed.

I believe that V5 has the "1-based offset into the forward reference strand where leftmost character of the alignment occurs" as described in [Bowtie Manual].[1] I checked the 1st read manually in Genome Browser, and it starts in chrX 69,672,576. So I can calculate the length of each sequence and add it to the V5 column +1, for the +strand. But I am not sure how to calculate the regions of the - strand.

I want to convert it to a Grange object

Thank you in advance

RNA-Seq GenomicRanges bowtie • 1.6k views
ADD COMMENT
0
Entering edit mode
6.7 years ago
ATpoint 89k

The simplest solution would be to make bowtie print output in sam format (option --sam), saving the output in BAM format, followed by reading the data into R with the GenomicAlignments package (readGAlignments function) which can then be converted to GRanges with granges(). Please check the documentation of the package.

bowtie --sam (your options...) | samtools view -o out.bam
ADD COMMENT
0
Entering edit mode

I would have done that but that output is a part of a workflow of a tool I use.

ADD REPLY
1
Entering edit mode

Ok I see. There is a script bowtie2sam.pl in the legacy folder of samtools that you could use to convert the bowtie output to SAM/BAM, followed by readGAlignments.

ADD REPLY
0
Entering edit mode

I tried the bowtie2sam.pl and got :

    Argument "IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII" isn't numeric in numeric eq (==) at ~/bowtie2sam.pl line 70, <> line 861046.
Argument "chr1" isn't numeric in addition (+) at ~/bowtie2sam.pl line 67, <> line 861047.

and then with samtools:

[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.

and finally with readGAlignments

readGAlignments("randometest.sam")
[bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). Error in value[[3L]](cond) :   failed to open BamFile: SAM/BAM header missing or empty   file: 'randometest.sam'
ADD REPLY

Login before adding your answer.

Traffic: 3152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6