BAM -> FASTQ Conversion of CCLE data for STAR-Fusion. Filtering Steps?
0
0
Entering edit mode
4.6 years ago
denis.k ▴ 20

Hey everyone,

I'm pretty new to RNA-seqencing and was wondering if anyone could help me out. I am trying to run a variety of SV callers (STAR-Fusion, etc.) on data from the CCLE (https://portal.gdc.cancer.gov/legacy-archive).

Most SV Callers require .fastq files but all the data I have downloaded is in BAM format. Here are some more details:

Firstly, the BAM files are coordinate sorted, and after realizing that they needed to be sorted by name in order for the paired fastq files to be created correctly, I sorted all files by name

I am using Samtools 1.9.

samtools sort -n infile.bam outfile_sorted.bam

Then:

samtools fastq -1 outfile_sorted_1.fastq.gz -2 outfile_sorted_2.fastq.gz outfile_sorted.bam

Is this process enough in order to feed the .fastq reads into the SV caller? I figured if I filtered out any non-primary reads, that the reads corresponding to fusions would also be filtered out. I'm seeing a LOT of duplicated sequences in my QC reports but I figured that wasn't a problem. I just wanted to make sure that I wasn't keeping a bunch of artifiacts in my .fastq files and potentially making my whole project useless.

sequence gene RNA • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 2140 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6