Skip to content

broadinstitute/cellpainting-gallery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cell Painting Gallery

This page provides a guide to the datasets that are available in the Cell Painting Gallery, hosted by the AWS Registry of Open Data (RODA): https://registry.opendata.aws/cellpainting-gallery

Citation/license

All the data is released with CC0 1.0 Universal (CC0 1.0). Still, professional ethics require that you cite the appropriate resources/publications, listed below, when using individual datasets. For example,

We used the dataset cpg0000 (Chandrasekaran et al., 2022), available from the Cell Painting Gallery on the Registry of Open Data on AWS (https://registry.opendata.aws/cellpainting-gallery/).

Documentation

Please see our documentation for extensive supporting information.

It includes:

Available datasets

All datasets are generated using the Cell Painting assay unless indicated otherwise. Several updates to that protocol exist (Cell Painting wiki).

The datasets are stored with the prefix indicated by the dataset name. e.g. the first dataset is located at s3://cellpainting-gallery/cpg0000-jump-pilot and can be listed using AWS CLI aws s3 ls --no-sign-request s3://cellpainting-gallery/cpg0000-jump-pilot/ (note the / at the end). See browsing data in our documentation for more information on viewing the gallery in a browser and examples of how to list files using AWS CLI or boto3.

The datasets' accession numbers are the first seven characters of the dataset name. e.g. the accession number of the first dataset is cpg0000.

Dataset name Description Publication to cite Associated repositories Total size Images size Numerical data size Cell Painting protocol Other aliases
cpg0000-jump-pilot 300+ compounds and 160+ genes (CRISPR knockout and overexpression) profiled in A549 and U2OS cells, at two timepoints (Chandrasekaran et al., 2024) Publication, Preprint, Description of Cell Painting v2.5. data 12.3 TB 6.1 TB 6.1 TB v2.5
cpg0001-cellpainting-protocol 300+ compounds profiled in U2OS cells using several different modifications of the Cell Painting protocol (Cimini et al., 2022) Publication, Preprint Description of Cell Painting v3. data 40.3 TB 18.7 TB 21.6 TB v3 and experiments
cpg0002-jump-scope 90 compounds (JUMP-MOA plate) profiled in U2OS using different microscopes and settings (Tromans-Coia and Jamali et al., 2023) Publication, Preprint data, analysis 16.7 TB 12.5 TB 4.2 TB v2.5
cpg0003-rosetta 28,000+ genes and compounds profiled in Cell Painting and L1000 gene expression (Haghighi et al., 2022) Publication, Preprint data 8.5 GB 0 8.5 GB
cpg0004-lincs 1,571 compounds across 6 doses in A549 cells (Way et al., 2022) Publication, Preprint data 65.7 TB 61.9 TB 3.8 TB v2 idr0125
cpg0010-caie-drugresponse MCF-7 breast cancer cells treated with 113 small molecules at eight concentrations. (Caie et al., 2010) Publication 239.2 GB 98.4 GB 140.8 GB other variation BBBC021
cpg0011-lipocyteprofiler Variety of lipocytes in different metabolic states and with genetic and drug perturbations (Laber and Strobel et al., 2023) Publication, Preprint Description of Cell Painting lipocyte variant. analysis 1.2 TB 1.2 TB 16 MB lipocyte
cpg0012-wawer-bioactivecompoundprofiling 30,000 compound dataset in U2OS cells (Wawer et al., 2014) Publication Description of Cell Painting v1, (Bray et al., 2017) Publication Description of Cell Painting v2 data 10.7 TB 3.1 TB 7.6 TB v1 idr0016, CDRP, BBBC036, BBBC047
cpg0015-heterogeneity 2,200+ compounds and 200+ genes profiles in U2OS cells (Rohban et al., 2019) Publication data 204 GB 0 204 GB idr0016, idr0036, idr0033
cpg0016-jump 116,000+ compounds and 16+ genes (CRISPR knockout and overexpression) profiled in U2OS cells. Over 8 million images (>126 TB), over 1.5 billion cells of numerical data (>126TB), for over 250 TB data in total. (Chandrasekaran et al., 2023) Preprint resource 358.4 TB v3
cpg0017-rohban-pathways 323 genes overexpressed in U2OS cells. Original images re-profiled in 2023 (Rohban et al, 2017) Publication, Preprint re-profiled data, original data 321 GB 189 GB 132 GB v1 BBBC037, TA-ORF
cpg0018-singh-seedseq U2OS cells treated with each of 315 unique shRNA sequences (Singh et al. 2013) Publication 247.1 GB 247.1 GB 0
cpg0019-moshkov-deepprofiler 8.3 million single cells from 232 plates, across 488 treatments from 5 public datasets, used for learning representations (Moshkov et al., 2022) Preprint data, software 522 GB 482 GB 40 GB dataset dependent
cpg0021-periscope 30 million cells with 20,000 single-gene knockouts in pooled format. A549 cells and HeLa cells in two growth media (Ramezani, Bauman, Singh, and Weisbart et al., 2023) Preprint Description of Cell Painting pooled variant. analysis, data, data 56.0 TB 45.0 TB 11.0 TB pooled
cpg0022-cmqtl 297 iPSC lines (Tegtmeyer et al., 2024) Publication, Preprint data 3.7 TB 2.8 TB 945 GB v2.5
cpg0028-kelley-resistance Bortezomib resistant HCT116 clones (Kelley et al., 2023) Publication data 4.1 TB 1.9 TB 2.2 TB
cpg0030-gustafsdottir-cellpainting U2OS cells treated with each of 1600 known bioactive compounds. Description of Cell Painting v1. (Gustafsdottir et al., 2013) Publication 234 GB 234 GB .3 GB v1 BBBC022, idr0036
cpg0031-caicedo-cmvip ORF over-expression of 596 alleles of 53 genes in A549 cells (Caicedo et al., 2023) Publication, Preprint original data, re-profiled data 2.2 TB 605 GB 1.6 TB v1 BBBC043, LUAD