Question: fastq tagalign file format from ncbi geo
0
gravatar for Sudhir Jadhao
2.5 years ago by
India
Sudhir Jadhao60 wrote:

Hello everyone,

I have file from ncbi geo of chipseq data with extenseion of *_R1.fastq.tagAlign.gz and *_R2.fastq.tagAlign.gz.

Not able to understand which format is it. and how to convert it into bedgraph file.

Format details : *_R1.fastq.tagAlign

First line from file :

HWI-ST1052:49:C0ARLACXX:1:1101:1238:1955 1:N:0: + chr17 43607603 TGCACCACTGCATCTGGCCACAAACATTTTGTTTTTTTACTGTTCATTTT CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJGIJJJJJIJJJHIJIJJJJ 1

*_R2.fastq.tagAlign

First line from file :

HWI-ST1052:49:C0ARLACXX:1:1101:1238:1955 2:N:0: - chr17 43607776 AGCTGGGATTACAGGTGCCTGCCACCACCCCCTGCTAATTTTTGTACTNT FFHHHHHHJJJJIJJJIJJJJJIHJHFJJJJJJJJJJHHHHHFFDBA1#C 0 1:T>N

chip-seq • 970 views
ADD COMMENTlink written 2.5 years ago by Sudhir Jadhao60

This looks more like a hybrid of a fastq and sam and tagalign. It looks like you will need to do some of your own processing to get this into a standard format.

Off the top of my head I would pull columns 3, 4, (length of col5+/-col4) , 1, 2 in this order, to make a bed file. then bed -> bg is straight forward.

For minus strand reads it should be col4 - length(col5) and plus strand it should be col4+length(col5); please double check that I tend to switch strand operations on occasion.

Good luck!

ADD REPLYlink written 2.5 years ago by ejm32430
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 979 users visited in the last hour