Question: BAM -> FASTQ Conversion of CCLE data for STAR-Fusion. Filtering Steps?
0
gravatar for denis.k
16 days ago by
denis.k10
denis.k10 wrote:

Hey everyone,

I'm pretty new to RNA-seqencing and was wondering if anyone could help me out. I am trying to run a variety of SV callers (STAR-Fusion, etc.) on data from the CCLE (https://portal.gdc.cancer.gov/legacy-archive).

Most SV Callers require .fastq files but all the data I have downloaded is in BAM format. Here are some more details:

Firstly, the BAM files are coordinate sorted, and after realizing that they needed to be sorted by name in order for the paired fastq files to be created correctly, I sorted all files by name

I am using Samtools 1.9.

samtools sort -n infile.bam outfile_sorted.bam

Then:

samtools fastq -1 outfile_sorted_1.fastq.gz -2 outfile_sorted_2.fastq.gz outfile_sorted.bam

Is this process enough in order to feed the .fastq reads into the SV caller? I figured if I filtered out any non-primary reads, that the reads corresponding to fusions would also be filtered out. I'm seeing a LOT of duplicated sequences in my QC reports but I figured that wasn't a problem. I just wanted to make sure that I wasn't keeping a bunch of artifiacts in my .fastq files and potentially making my whole project useless.

rna sequence gene • 85 views
ADD COMMENTlink modified 16 days ago • written 16 days ago by denis.k10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2079 users visited in the last hour