Question: Missing header sam file
0
gravatar for G.Car
21 months ago by
G.Car20
G.Car20 wrote:

Hi everyone

I am trying to analyse some old data on STAT3 binding locations in macrophages upon IL-10 treatment. I found a dataset which perfectly matches what I want, however it was only available in bowtie format. My initial aim is to view it in the UCSC genome browser and quickly check for peaks at my genes of interest then move on to a more detailed analysis.

I managed (with some difficulty, I'm very new to all this) to convert it into a SAM file, however when I try to upload it to UCSC I get an error. After spending a while trying to figure out what was up, I discovered that it's missing the @SEQ header.

Now, I know that it has been mapped to mm9 as the reference genome, so I was hoping someone could help me generate a basic header.

I have read around and attempted converting it to a bam, using a fasta file of mm9 chr1, however I'm kicked back given an error:

samtools view -bT Documents/chr2.fa Documents/ChIP/STAT3.txt > STAT3.bam
[samfaipath] build FASTA index...
[W::sam_parse1] urecognized reference name; treated as unmapped
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.

I would appreciate any help people can provide, or alternate methods of generating the @SEQ header, and please explain in detail, I'm new to all this and it takes me a while to understand what exactly I have to do.

Thank you!

bam sam chip-seq header • 1.6k views
ADD COMMENTlink modified 21 months ago by Pierre Lindenbaum118k • written 21 months ago by G.Car20

wha is the output of

head Documents/ChIP/STAT3.txt
ADD REPLYlink written 21 months ago by Pierre Lindenbaum118k
head Documents/ChIP/STAT3.txt

A204RKABXX:3:26:4008:112894#TGACCAAT 0 chr15 14328859 25 47M * TACCTTGCTTTGGGGATTACAGTTAAGTGACTGAATGAACCTCAGGA GGGGGGGGGGGGGGGGGGGGGGGGGGGFGEGGGGGGGGGGGGGDGGD NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:6009:112894#TGACCAAT 0 chr18 43952303 25 47M * CAGCCCAGTGTTCTTTATGTGGCGCCAAAATGCCCCTCCCCTTTAGT GFGGGGGGGGGGGGGGGGFGGGGGGGGGDFGGGGGGGGGGGGGGEGE NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:5766:112899#TGACCAAT 0 chr2 17589600 25 47M * AGGAAGACACTGGACTTTTTATGGCTGGTACTAGGCATATCTCCCTG GGGFFGGGFGGGGGGGGGGGEGEGGGG#################### NM:i:1 X1:i:1 MD:Z:27G16C2
A204RKABXX:3:26:8370:112900#TGACCAAT 16 chr8 64320362 25 47M * TCGCCTATTTTGTTAGTTTGAAACAACTATGCAGCCCTGAATGACTT GFGGGGFFFGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGFGGGGG NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:8496:112900#TGACCAAT 0 chr2 132770552 25 47M * TGGTTTTCCCACATTCCTTTCCTATCTCTCTGCGCCTTCAGTTTGGC EEEEGGGGGGGGGGGGGGGGGGEGGFGGGGGGGEEGFGGGGFGFDEB NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:8144:112893#TGACCAAT 0 chr3 27838344 25 47M * TTTGTTTGAAACAGTCTTCTGTAGCTCAGGCTGCACTCAAAGGCTAT GGGGGGGFGGGFGGGFGGDDDEFGGDFEFDEE?EEBEBDEEAFEEDA NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:12358:112894#TGACCAAT 0 chr4 134222144 25 47M * TGTTAGCCTTGGTTTCTGTTCCCGGCCATTCACACACAGCCCACCTC GGGFGGGGGGDFFGFGGFGFGDGGGFGGGEGEG:CCCCEEGG?EE?E NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:16734:112899#TGACCAAT 16 chr2 29721733 25 47M * GGTCGAGATCCAGAAGATCTGCTGTCTGGTGAGGACCTGTTCCTCAC ECBCGDGEGDFCFFCEBAEAGGGGEEGGGGFGGGEEGGGGEFGGGGG NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:16019:112892#TGACCAAT 16 chr16 85400695 25 47M * AGCAAACACCAGGAAAATAGCAGGATACTGTTGCTAAGGAAATGGGA GDFGFFFBFFFGFGGFGAGEGGGGFFFFEEGGGGEGGFGGEGFGGGG NM:i:0 X0:i:1 MD:Z:47 A204RKABXX:3:26:14189:112898#TGACCAAT 0 chr9 54525873 25 47M * CATTGGTATTTTGACTGCATGTCTGTCTGTGTTAGATCCCCTGGAAC EBFFFEFDEDEEAEDEE?DDCDCFFFBFEEAEECEAEEFFEEEEFEE NM:i:0 X0:i:1 MD:Z:47

ADD REPLYlink written 21 months ago by G.Car20
1
gravatar for Pierre Lindenbaum
21 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

(not tested) you can create a dict file using picard https://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary and then create the bam file:

cat new.dict Documents/ChIP/STAT3.txt | samtools view -Sb -o out.bam -
ADD COMMENTlink written 21 months ago by Pierre Lindenbaum118k

Thanks for the suggestion, but setting up picard goes way over my head when it comes to UNIX. I'm barely scraping by as it is, but having to set up an environmental variable to access picard requires a level of understanding that I just don't have.

ADD REPLYlink written 21 months ago by G.Car20

you can always use a dict on the web: http://dldcc-web.brc.bcm.edu/lilab/deqiangs/ref/bowtie/Mm9/mm9.dict , but you need to be sure that it is the very same reference genome (UR and M5 tag are not required/checked)

ADD REPLYlink modified 21 months ago • written 21 months ago by Pierre Lindenbaum118k

Thanks, how would I go about checking it? It's definitely against mm9 but that's all I know.

ADD REPLYlink written 21 months ago by G.Car20

just try....

ADD REPLYlink written 21 months ago by Pierre Lindenbaum118k

Hmm, no, I get another error: [W::sam_read1] Parse error at line 24 [main_samview] truncated file.

ADD REPLYlink written 21 months ago by G.Car20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 892 users visited in the last hour