1. Install HELP from GitHub ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Skip this cell if you alread have installed HELP. .. code:: ipython3 !pip install git+https://github.com/giordamaug/HELP.git 2. Download the input files ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Download from the DepMap portal the gene deletion expression scores (``CRISPRGeneEffect.csv``) and the map between cell-lines and tissues (``Model.csv``). Skip this step if you already have these input files locally. .. code:: ipython3 !wget -c https://figshare.com/ndownloader/files/43346616 -O CRISPRGeneEffect.csv !wget -c https://figshare.com/ndownloader/files/43746708 -O Model.csv 3. Load the input file ~~~~~~~~~~~~~~~~~~~~~~ Load the CRISPR data and show the content. .. code:: ipython3 import pandas as pd import os df_orig = pd.read_csv("CRISPRGeneEffect.csv").rename(columns={'Unnamed: 0': 'gene'}).rename(columns=lambda x: x.split(' ')[0]).set_index('gene').T print(f'{df_orig.isna().sum().sum()} NaNs over {len(df_orig)*len(df_orig.columns)} values') df_orig .. parsed-literal:: /Users/maurizio/HELPold/help/datafinal 739493 NaNs over 20287300 values .. raw:: html
gene | ACH-000001 | ACH-000004 | ACH-000005 | ACH-000007 | ACH-000009 | ACH-000011 | ACH-000012 | ACH-000013 | ACH-000015 | ACH-000017 | ... | ACH-002693 | ACH-002710 | ACH-002785 | ACH-002799 | ACH-002800 | ACH-002834 | ACH-002847 | ACH-002922 | ACH-002925 | ACH-002926 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A1BG | -0.122637 | 0.019756 | -0.107208 | -0.031027 | 0.008888 | 0.022670 | -0.096631 | 0.049811 | -0.099040 | -0.044896 | ... | -0.072582 | -0.033722 | -0.053881 | -0.060617 | 0.025795 | -0.055721 | -0.009973 | -0.025991 | -0.127639 | -0.068666 |
A1CF | 0.025881 | -0.083640 | -0.023211 | -0.137850 | -0.146566 | -0.057743 | -0.024440 | -0.158811 | -0.070409 | -0.115830 | ... | -0.237311 | -0.108704 | -0.114864 | -0.042591 | -0.132627 | -0.121228 | -0.119813 | -0.007706 | -0.040705 | -0.107530 |
A2M | 0.034217 | -0.060118 | 0.200204 | 0.067704 | 0.084471 | 0.079679 | 0.041922 | -0.003968 | -0.029389 | 0.024537 | ... | -0.065940 | 0.079277 | 0.069333 | 0.030989 | 0.249826 | 0.072790 | 0.044097 | -0.038468 | 0.134556 | 0.067806 |
A2ML1 | -0.128082 | -0.027417 | 0.116039 | 0.107988 | 0.089419 | 0.227512 | 0.039121 | 0.034778 | 0.084594 | -0.003710 | ... | 0.101541 | 0.038977 | 0.066599 | 0.043809 | 0.064657 | 0.021916 | 0.041358 | 0.236576 | -0.047984 | 0.112071 |
A3GALT2 | -0.031285 | -0.036116 | -0.172227 | 0.007992 | 0.065109 | -0.130448 | 0.028947 | -0.120875 | -0.052288 | -0.336776 | ... | 0.005374 | -0.144070 | -0.256227 | -0.116473 | -0.294305 | -0.221940 | -0.146565 | -0.239690 | -0.116114 | -0.149897 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
ZYG11A | -0.289724 | 0.032983 | -0.201273 | -0.100344 | -0.112703 | 0.013401 | 0.005124 | -0.089180 | -0.005409 | -0.070396 | ... | -0.296880 | -0.084936 | -0.128569 | -0.110504 | -0.087171 | 0.024959 | -0.119911 | -0.079342 | -0.043555 | -0.045115 |
ZYG11B | -0.062972 | -0.410392 | -0.178877 | -0.462160 | -0.598698 | -0.296421 | -0.131949 | -0.145737 | -0.216393 | -0.257916 | ... | -0.332415 | -0.193408 | -0.327408 | -0.257879 | -0.349111 | 0.015259 | -0.289412 | -0.347484 | -0.335270 | -0.307900 |
ZYX | 0.074180 | 0.113156 | -0.055349 | -0.001555 | 0.095877 | 0.067705 | -0.109147 | -0.034886 | -0.137350 | 0.029457 | ... | -0.005090 | -0.218960 | -0.053033 | -0.041612 | -0.057478 | -0.306562 | -0.195097 | -0.085302 | -0.208063 | 0.070671 |
ZZEF1 | 0.111244 | 0.234388 | -0.002161 | -0.325964 | -0.026742 | -0.232453 | -0.164482 | -0.175850 | -0.168087 | -0.284838 | ... | -0.188751 | -0.120449 | -0.267081 | 0.006148 | -0.189602 | -0.148368 | -0.206400 | -0.095965 | -0.094741 | -0.187813 |
ZZZ3 | -0.467908 | -0.088306 | -0.186842 | -0.486660 | -0.320759 | -0.347234 | -0.277397 | -0.519586 | -0.282338 | -0.247634 | ... | -0.239991 | -0.311396 | -0.202158 | -0.195154 | -0.107107 | -0.579576 | -0.486525 | -0.346272 | -0.222404 | -0.452143 |
18443 rows × 1100 columns
ModelID | PatientID | CellLineName | StrippedCellLineName | DepmapModelType | OncotreeLineage | OncotreePrimaryDisease | OncotreeSubtype | OncotreeCode | LegacyMolecularSubtype | ... | TissueOrigin | CCLEName | CatalogNumber | PlateCoating | ModelDerivationMaterial | PublicComments | WTSIMasterCellID | SangerModelID | COSMICID | LegacySubSubtype | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ACH-000001 | PT-gj46wT | NIH:OVCAR-3 | NIHOVCAR3 | HGSOC | Ovary/Fallopian Tube | Ovarian Epithelial Tumor | High-Grade Serous Ovarian Cancer | HGSOC | NaN | ... | NaN | NIHOVCAR3_OVARY | HTB-71 | NaN | NaN | NaN | 2201.0 | SIDM00105 | 905933.0 | high_grade_serous |
1 | ACH-000002 | PT-5qa3uk | HL-60 | HL60 | AML | Myeloid | Acute Myeloid Leukemia | Acute Myeloid Leukemia | AML | NaN | ... | NaN | HL60_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE | CCL-240 | NaN | NaN | NaN | 55.0 | SIDM00829 | 905938.0 | M3 |
2 | ACH-000003 | PT-puKIyc | CACO2 | CACO2 | COAD | Bowel | Colorectal Adenocarcinoma | Colon Adenocarcinoma | COAD | NaN | ... | NaN | CACO2_LARGE_INTESTINE | HTB-37 | NaN | NaN | NaN | NaN | SIDM00891 | NaN | NaN |
3 | ACH-000004 | PT-q4K2cp | HEL | HEL | AML | Myeloid | Acute Myeloid Leukemia | Acute Myeloid Leukemia | AML | NaN | ... | NaN | HEL_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE | ACC 11 | NaN | NaN | NaN | 783.0 | SIDM00594 | 907053.0 | M6 |
4 | ACH-000005 | PT-q4K2cp | HEL 92.1.7 | HEL9217 | AML | Myeloid | Acute Myeloid Leukemia | Acute Myeloid Leukemia | AML | NaN | ... | NaN | HEL9217_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE | HEL9217 | NaN | NaN | NaN | NaN | SIDM00593 | NaN | M6 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1916 | ACH-003157 | PT-QDEP9D | ABM-T0822 | ABMT0822 | ZIMMMPLC | Lung | Non-Cancerous | Immortalized MPLC Cells | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1917 | ACH-003158 | PT-nszsxG | ABM-T9220 | ABMT9220 | ZIMMSMCI | Muscle | Non-Cancerous | Immortalized Smooth Muscle Cells, Intestinal | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1918 | ACH-003159 | PT-AUxVvV | ABM-T9233 | ABMT9233 | ZIMMRSCH | Hair | Non-Cancerous | Immortalized Hair Follicle Inner Root Sheath C... | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1919 | ACH-003160 | PT-AUxVvV | ABM-T9249 | ABMT9249 | ZIMMGMCH | Hair | Non-Cancerous | Immortalized Hair Germinal Matrix Cells | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1920 | ACH-003161 | PT-or1hkT | ABM-T9430 | ABMT9430 | ZIMMPSC | Pancreas | Non-Cancerous | Immortalized Pancreatic Stromal Cells | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1921 rows × 36 columns
gene | ACH-000001 | ACH-000004 | ACH-000005 | ACH-000007 | ACH-000009 | ACH-000011 | ACH-000012 | ACH-000013 | ACH-000015 | ACH-000017 | ... | ACH-002693 | ACH-002710 | ACH-002785 | ACH-002799 | ACH-002800 | ACH-002834 | ACH-002847 | ACH-002922 | ACH-002925 | ACH-002926 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A1BG | -0.122637 | 0.019756 | -0.107208 | -0.031027 | 0.008888 | 0.022670 | -0.096631 | 0.049811 | -0.099040 | -0.044896 | ... | -0.072582 | -0.033722 | -0.053881 | -0.060617 | 0.025795 | -0.055721 | -0.009973 | -0.025991 | -0.127639 | -0.068666 |
A1CF | 0.025881 | -0.083640 | -0.023211 | -0.137850 | -0.146566 | -0.057743 | -0.024440 | -0.158811 | -0.070409 | -0.115830 | ... | -0.237311 | -0.108704 | -0.114864 | -0.042591 | -0.132627 | -0.121228 | -0.119813 | -0.007706 | -0.040705 | -0.107530 |
A2M | 0.034217 | -0.060118 | 0.200204 | 0.067704 | 0.084471 | 0.079679 | 0.041922 | -0.003968 | -0.029389 | 0.024537 | ... | -0.065940 | 0.079277 | 0.069333 | 0.030989 | 0.249826 | 0.072790 | 0.044097 | -0.038468 | 0.134556 | 0.067806 |
A2ML1 | -0.128082 | -0.027417 | 0.116039 | 0.107988 | 0.089419 | 0.227512 | 0.039121 | 0.034778 | 0.084594 | -0.003710 | ... | 0.101541 | 0.038977 | 0.066599 | 0.043809 | 0.064657 | 0.021916 | 0.041358 | 0.236576 | -0.047984 | 0.112071 |
A3GALT2 | -0.031285 | -0.036116 | -0.172227 | 0.007992 | 0.065109 | -0.130448 | 0.028947 | -0.120875 | -0.052288 | -0.336776 | ... | 0.005374 | -0.144070 | -0.256227 | -0.116473 | -0.294305 | -0.221940 | -0.146565 | -0.239690 | -0.116114 | -0.149897 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
ZYG11A | -0.289724 | 0.032983 | -0.201273 | -0.100344 | -0.112703 | 0.013401 | 0.005124 | -0.089180 | -0.005409 | -0.070396 | ... | -0.296880 | -0.084936 | -0.128569 | -0.110504 | -0.087171 | 0.024959 | -0.119911 | -0.079342 | -0.043555 | -0.045115 |
ZYG11B | -0.062972 | -0.410392 | -0.178877 | -0.462160 | -0.598698 | -0.296421 | -0.131949 | -0.145737 | -0.216393 | -0.257916 | ... | -0.332415 | -0.193408 | -0.327408 | -0.257879 | -0.349111 | 0.015259 | -0.289412 | -0.347484 | -0.335270 | -0.307900 |
ZYX | 0.074180 | 0.113156 | -0.055349 | -0.001555 | 0.095877 | 0.067705 | -0.109147 | -0.034886 | -0.137350 | 0.029457 | ... | -0.005090 | -0.218960 | -0.053033 | -0.041612 | -0.057478 | -0.306562 | -0.195097 | -0.085302 | -0.208063 | 0.070671 |
ZZEF1 | 0.111244 | 0.234388 | -0.002161 | -0.325964 | -0.026742 | -0.232453 | -0.164482 | -0.175850 | -0.168087 | -0.284838 | ... | -0.188751 | -0.120449 | -0.267081 | 0.006148 | -0.189602 | -0.148368 | -0.206400 | -0.095965 | -0.094741 | -0.187813 |
ZZZ3 | -0.467908 | -0.088306 | -0.186842 | -0.486660 | -0.320759 | -0.347234 | -0.277397 | -0.519586 | -0.282338 | -0.247634 | ... | -0.239991 | -0.311396 | -0.202158 | -0.195154 | -0.107107 | -0.579576 | -0.486525 | -0.346272 | -0.222404 | -0.452143 |
18443 rows × 1091 columns