Question: How to convert output from bowtie1 to Genomic ranges
0
gravatar for Konstantinos Yeles
8 days ago by
Italy/Salerno
Konstantinos Yeles40 wrote:

Hello, I have output from bowtie 1

bowtie  '~/UCSC/hg38/Sequence/BowtieIndex/genome'  -f '~/sample.fa'  
 -v 1  -k 1 -p 6 --al outputAligned --un outputNotAligned > randomtest1

that it looks like that:

    fread("randomtest1")

     V1      V2 V3    V4        V5                                     V6                                     V7 V8 V9
     1: t00000001 1148630  +  chrX  69672575               GGAATACCGGGTGCTGTAGGCTTT               IIIIIIIIIIIIIIIIIIIIIIII  1 NA
     2: t00000002 1078059  - chr11  64891202                  GGCTGTCAATTCATAGGTCAG                  IIIIIIIIIIIIIIIIIIIII  0 NA
     3: t00000003 1038720  + chr19  53787930                AAAGTGCTGCGACATTTGAGCGT                IIIIIIIIIIIIIIIIIIIIIII  0 NA
     4: t00000004 1027976  + chr13  91351360                 TATTGCACTTGTCCCGGCCTGT                 IIIIIIIIIIIIIIIIIIIIII  0 NA
     5: t00000005  948116  - chr17   1713913                 ACAGTTCTTCAACTGGCAGCTT                 IIIIIIIIIIIIIIIIIIIIII  0 NA
    ---                                                                                                                         
611991: t01137877       1  +  chr6  21215555                   TCCTTTCAGCTCTACAACTC                   IIIIIIIIIIIIIIIIIIII  0 NA
611992: t01137878       1  + chr12 110012992       ATCCTGGTAAGGTTGGACTGCAGCCCTTTTCT       IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0 NA
611993: t01137880       1  +  chrM     11393 TCCACTTATGACTCCCTAAAGCCCATGTCGAAGCCCCC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0 NA
611994: t01137885       1  +  chr7 133383687      TGTCGACTTTGATATTTTGGTTACCTGAGGATT      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0 NA
611995: t01137887       1  - chr20  62414571      AGCCGAATGACCATGGCTGTGTGGAGCCGTGAC      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0 NA

so the file is not a sam / bam or bed.

I believe that V5 has the "1-based offset into the forward reference strand where leftmost character of the alignment occurs" as described in [Bowtie Manual].[1] I checked the 1st read manually in Genome Browser, and it starts in chrX 69,672,576. So I can calculate the length of each sequence and add it to the V5 column +1, for the +strand. But I am not sure how to calculate the regions of the - strand.

I want to convert it to a Grange object

Thank you in advance

ADD COMMENTlink modified 8 days ago by ATpoint13k • written 8 days ago by Konstantinos Yeles40
0
gravatar for ATpoint
8 days ago by
ATpoint13k
Germany
ATpoint13k wrote:

The simplest solution would be to make bowtie print output in sam format (option --sam), saving the output in BAM format, followed by reading the data into R with the GenomicAlignments package (readGAlignments function) which can then be converted to GRanges with granges(). Please check the documentation of the package.

bowtie --sam (your options...) | samtools view -o out.bam
ADD COMMENTlink written 8 days ago by ATpoint13k

I would have done that but that output is a part of a workflow of a tool I use.

ADD REPLYlink written 8 days ago by Konstantinos Yeles40
1

Ok I see. There is a script bowtie2sam.pl in the legacy folder of samtools that you could use to convert the bowtie output to SAM/BAM, followed by readGAlignments.

ADD REPLYlink written 8 days ago by ATpoint13k

I tried the bowtie2sam.pl and got :

    Argument "IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII" isn't numeric in numeric eq (==) at ~/bowtie2sam.pl line 70, <> line 861046.
Argument "chr1" isn't numeric in addition (+) at ~/bowtie2sam.pl line 67, <> line 861047.

and then with samtools:

[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.

and finally with readGAlignments

readGAlignments("randometest.sam")
[bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). Error in value[[3L]](cond) :   failed to open BamFile: SAM/BAM header missing or empty   file: 'randometest.sam'
ADD REPLYlink modified 7 days ago • written 7 days ago by Konstantinos Yeles40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1440 users visited in the last hour