We encourage others to use these data in their research! Please refer to the project authors as the West Coast Dream Team collaboration rather than individual researchers when describing this project in the text of a manuscript or abstract.
The genomic and epigenomic landscape of double-negative metastatic prostate cancer
The West Coast Dream Team collaboration group performed whole genome WGS and WGBS and RNA sequencing of 134 tumor biopsies from patients with metastatic castration-resistant prostate cancer (Lundberg et al. Cancer Research 2023). Scripts employed during the analysis are available on github:
https://github.com/DavidQuigley/WCDT_subtypes. A total of 210 RNA-seq experiments were available for this study; only some of these patients also had WGS and WGBS data. The samples with RNA-seq data in Lundberg et al. Cancer Research 2023 are a superset of samples characterized in Quigley et al. Cell 2018, described below.
mRNA data
- RNA TPM data as processed and used in this paper can be downloaded from the URL:
https://quigleylab.s3.us-west-2.amazonaws.com/datasets/Lundberg_CR_2023_TPM_210_samples.txt.gz (18 Mb. gzipped file) - RNA transcript level count data as processed and used in this paper can be downloaded from the URL:
https://quigleylab.s3.us-west-2.amazonaws.com/datasets/Lundberg_CR_2023_Counts_210_samples.txt.gz (8 Mb. gzipped file)
Genomic Hallmarks and Structural Variation in Metastatic Prostate Cancer
The West Coast Dream Team collaboration group performed whole genome and DNA and RNA sequencing of 101 tumor biopsies from patients with metastatic castration-resistant prostate cancer (Quigley, Dang, Zhao et al. Cell 2018). Scripts employed during the analysis are available on github:
https://github.com/DavidQuigley/WCDT.
Whole genome DNA sequence data
- Raw whole genome data are subject to controlled access, as they include germline genomes from the individuals who generously donated tissue samples for this study.
The whole genome and transcriptome data have been deposited with the NCBI, and are available at the Genomic Data Commons. requests for access to these data should go through those channels. - DNA copy number calls that were used in the manuscript can be downloaded from the URL:
https://quigleylab.s3.us-west-2.amazonaws.com/datasets/2018_04_15_matrix_CN_integer_symbol_copycat.txt.zip (3 Mb. text file). This file contains consensus copy number calls for 29,798 transcripts across 101 symbols. - DNA BED files that were used in the manuscript can be downloaded from the URL:
https://quigleylab.s3.us-west-2.amazonaws.com/datasets/WCDT_copycat_bedfiles.tar.gz (771 Kb. tarball). This file contains a BED file generated by a call to CopyCat for each of the 101 samples. In our analysis we used the following bounds for copy number calling:
chr1-chr22 Gain / shallow loss / deep loss: 3 / 1.65 / 0.6
chrX, chrY Gain / loss: 1.4, 0.6 - Somatic VCF files generated by Strelka, filtered to include only those calls assigned a quality flag of PASS, can be downloaded from the URL:
https://quigleylab.s3.us-west-2.amazonaws.com/datasets/2018_04_15_WCDT_somatic_vcf.tar (78.5 Mb tar file). - Somatic structural variant data derived from calls made by Manta, filtered to include only those calls observed only convincingly in tumors, can be downloaded from the URL:
https://quigleylab.s3.us-west-2.amazonaws.com/datasets/2018_04_15_list_manta_SV.zip (500 Kb text file). - Somatic structural variant summary table with the summary values plotted in Figure 3A:
https://quigleylab.s3.us-west-2.amazonaws.com/datasets/2021_09_25_SV_summary_table.txt - Supplementary Table 1 showing sample summaries (including HRD assignments)
https://quigleylab.s3.us-west-2.amazonaws.com/datasets/supplementary_table_S1.xlsx
mRNA data
Laser-capture microdissected tumor tissue was subjected to RNA-seq. RNA reads were then aligned against HG38-decoy using STAR as described in the manuscript, producing per-gene count files (see below). RNA data were available for 99 of the 101 samples with DNA-seq data, so please expect to see 99 columns in matrix files for this study. A total of 26,485 transcripts were assessed for counts. Count files were then processed to calculate TPM values using the code at https://github.com/DavidQuigley/WCDT/scripts/calculate_RNA_tpm.R, using code adapted from https://gist.github.com/slowkow/c6ab0348747f86e2748b. This script marked as absent any individual gene if no sample had at least 100 counts for that gene and if the mean number of counts across all 101 samples was less than 100. After filtering, 16,844 genes with TPM calls were included in the analysis. You can re-process the count data to your own satisfaction using the raw data linked below.
- RNA count data can be downloaded from the URL:
https://quigleylab.s3.us-west-2.amazonaws.com/datasets/2018_04_15_matrix_rna_counts.txt.zip (2.9 Mb. text file) - Processed mRNA data (TPM calls) can be downloaded from the URL:
https://quigleylab.s3.us-west-2.amazonaws.com/datasets/2018_04_15_matrix_rna_tpm.txt.zip (4.4 Mb. text file)
HTDoseResponseCurve
Source code for the R package HTDoseResponseCurve is available on GitHub at https://github.com/DavidQuigley/HTDoseResponseCurve.
A pre-build ource package (HTDoseResponseCurve_0.99.0.tar.gz) is available.