Where to find the rna pair end reads with fastq fomat
1
0
Entering edit mode
7.4 years ago
lxwgcool ▴ 10

Hey guys,

I am trying to download the raw rna reads (fastq file) of mus musculus brain (GRCm38, release 79).

I can find the bam file from ensembl, and here is the link:

http://ftp.ensembl.org/pub/release-79/data_files/mus_musculus/GRCm38/rnaseq/GRCm38.sanger.brain.1.bam

However, i can not find its corresponding fastq reads. Can anyone help me where can i find the rna pair-end reads related to this bam file?

I know we could export reads from the bam file, however, the reads be exported are merged together (I need seperated pair). In addition, i think it will be more guarantee if we could download the fastq file from some websites directly.

Thanks so much

Any helps are appreciated.

RNA-Seq • 1.5k views
ADD COMMENT
1
Entering edit mode
7.4 years ago

You can use bamtofastq from bedtools like this to get the pairs in separate files

bedtools bamtofastq -i http://ftp.ensembl.org/pub/release-79/data_files/mus_musculus/GRCm38/rnaseq/GRCm38.sanger.brain.1.bam -fq read1.fq -fq2 read2.fq

UPDATE:

The bam-file need to be sorted by read name for bamtofastq to pick them as read1/read2 one after other. Step-by-step

1 Download BAM (also see Note1)

$ wget     http://ftp.ensembl.org/pub/release-79/data_files/mus_musculus/GRCm38/rnaseq/GRCm38.sanger.brain.1.bam

2 Check the integrity of dloaded bam-file

$ wget http://ftp.ensembl.org/pub/release-79/data_files/mus_musculus/GRCm38/rnaseq/md5sum.txt  
$ grep GRCm38.sanger.brain.1.bam md5sum.txt  > my.md5sum.txt 
$ md5sum -c my.md5sum.txt

The last md5sum command should print OK, else re-download the file.

3 sort the bam ( -n = sort by read names; -@14 = 14 threads)

samtools sort -n -@14 GRCm38.sanger.brain.1.bam GRCm38.sanger.brain.1.sort

4 Run bam2fastq on sorted bam

bedtools bamtofastq -i GRCm38.sanger.brain.1.sort.bam -fq read1.fq -fq2 read2.fq

Notes

1 if you have axel installed, you can accelerate the download using multithreading. Axel, using 14 threads:

axel -a -n 14 http://ftp.ensembl.org/pub/release-79/data_files/mus_musculus/GRCm38/rnaseq/GRCm38.sanger.brain.1.bam

2

In addition, i think it will be more guarantee if we could download the fastq file from some websites directly.

There is nothing to worry. The sequence-info and qual-scores are all there in standard well formated BAM-file, unless the reads are hard-clipped or the BAM-file is filtered.

PS: I searched but could not find the fastq file on web.

ADD COMMENT
0
Entering edit mode

Thanks so much for your suggestion. I just compiled the bedtools, and tried the bamtofastq. Some of reads could be exported successfully. However, i got a bunch of errors as below:

*WARNING: Query ERR033016.15707411 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.

It seems that some of pairs fail to be exported due to one part of the pair missing. Do you know how to fix this problem? Or are there some website that i can download those reads directly?

Thanks so much for your helps.

ADD REPLY
0
Entering edit mode

The BAM-file need to be read-name sorted. See updated reply.

ADD REPLY

Login before adding your answer.

Traffic: 2929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6