File formats for ATACseq data
1
0
Entering edit mode
4 days ago
c.eskiw • 0

Good day all,

I am trying to generate bedgraphs for some ATACseq data I have generated. I am using the Ubuntu 20.02 environment.

I have used trim_galore, Hisat2 and Masc2 (and MACS3) to map the data and peak call. All good here.

However, I am also trying to generate bedgraphs for visualization in the UCSC genome browser. BEDtools will not merge my replicates. Apparently I have issues with Hisat not generating files with e the chromosome in the first column position. I have tried bedtools and samtools to convert to 1) convert the files back to .bam files followed by indexing (generating .bai files that I have failed to utilize) and 2) realigning the data using STAR. I have run into nothing but issues. As follows are some of the error messages I have received:

(base) ceskiw@DESKTOP-4267CK1:~bamCoverage -p 7 -b /mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam --skipNAs --normalizeUsing RPKM -o /mnt/e/CartilageATACseq_Aug20_21/Day0merge.bedgraph -of bedgraph [E::idx_find_and_load] Could not retrieve index file for '/mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam' '/mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam' does not appear to have an index. You MUST index the file first! samtools sort -o /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam /mnt/e/CartilageATACseq_Aug20_21/Day0merge.bam  This did generate index .bai files. bamCoverage -b /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai --skipNAs --normalizeUsing RPKM -o /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bedgraph --outFileFormat bedgraph [E::hts_hopen] Failed to open file /mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai [E::hts_open_format] Failed to open file "/mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai" : Exec format error The file '/mnt/e/CartilageATACseq_Aug20_21/Day0merge_sort.bam.bai' does not exist  Any help is of course greatly appreciated. fasta .fq fastq • 210 views ADD COMMENT 1 Entering edit mode 4 days ago ATpoint 57k The -b option takes a BAM files, not the BAI file. If you provide the BAM the tool will automatically fetch the corresponding BAI file if it exists. Are you sure the BAI index exists? Usually that is created using samtools index using the sorted BAM file. Maybe the recent SAMtools versions do that automatically when using sort, I did not check that recently. ADD COMMENT 0 Entering edit mode Maybe the recent SAMtools versions do that automatically when using sort, I did not check that recently. Not unless you include --write-index in sort command AFAIK. ADD REPLY 0 Entering edit mode Nothing seemed to work on the first attempt here. I have stepped back to realign the reads with STAR. This is also failing. ceskiw@DESKTOP-4267CK1:~ STAR --runThreadN 7 --runMode genomeGenerate --genomeDir /mnt/f/ReferenceGenomes_GTFs/mouse2022.fa --genomeFastaFiles /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R1_trimmed.fasta /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R2_trimmed.fasta
STAR --runThreadN 7 --runMode genomeGenerate --genomeDir /mnt/f/ReferenceGenomes_GTFs/mouse2022.fa --genomeFastaFiles /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R1_trimmed.fasta /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R2_trimmed.fasta
STAR version: 2.7.9a   compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Jan 14 09:28:43 ..... started STAR run
!!!!! WARNING: Could not move Log.out file from ./Log.out into /mnt/f/ReferenceGenomes_GTFs/mouse2022.fa/Log.out. Will keep ./Log.out

Jan 14 09:28:43 ... starting to generate Genome files

EXITING because of INPUT ERROR: the file format of the genomeFastaFile: /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R1_trimmed.fasta is not fasta: the first character is '@' (64), not '>'.
Solution: check formatting of the fasta file. Make sure the file is uncompressed (unzipped).

Jan 14 09:28:43 ...... FATAL ERROR, exiting


I am starting to think that using Trim_galore is the issue with the organization of the output files? Any insight into if this is a possibility? I will switch to Trimmomatic to see if this makes a difference. I will also see if there is other output options for Trim_galore that may correct this.

1
Entering edit mode

Probably your entire script has flaws, such as naming fastq files as fasta, e.g. /mnt/f/CartilageATACseq_Aug20_21/Day0/day0_R1_trimmed.fasta. Consider using an end-to-end pipeline such as nf-core/atacseq.