1. Install HELP from GitHub (and Karateclub) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Skip this cell if you already have installed HELP and Karateclub. .. code:: ipython3 !pip install git+https://github.com/giordamaug/HELP.git !pip install -q karateclub 2. Download the PPI input file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The file describing the PPI network for the Kidney tissue can be found on GitHub. .. code:: ipython3 !wget https://raw.githubusercontent.com/giordamaug/HELP/main/data/Kidney_PPI.csv 3. Load the PPI input file ~~~~~~~~~~~~~~~~~~~~~~~~~~ The PPI network is given as a csv file based on the data downloaded from the Integrated Interaction Database (IID), where each row represents one of the edges, and the three columns represent: + Column ``A``: gene name of the edge source; + Column ``B``: gene name of the edge target; + Column ``combined_score``: attribute providing the edge weight, which can be 1 if supported by one piece of evidence, 2 if supported by two or 3 if supported by three. .. code:: ipython3 import pandas as pd from HELPpy.preprocess.embedding import PPI_embed df_net = pd.read_csv('Kidney_PPI.csv') 4. Compute the embedding ~~~~~~~~~~~~~~~~~~~~~~~~ Compute the result using a graph embedding method (here ``method="Node2Vec"``) to produce embedding vectors of default length (128). .. code:: ipython3 df_embed = PPI_embed(df_net, method="Node2Vec", verbose=True) Please be aware that this will take almost 2 hours in sequential execution. 5. Save and show the embedding ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The embedding result is a DataFrame with rows the gene names and columns the components of the 128-sized vector representing each node/gene in the PPI. .. code:: ipython3 df_embed.to_csv('Kidney_EmbN2V_128.csv') df_embed .. raw:: html
Node2Vec_0 | Node2Vec_1 | Node2Vec_2 | Node2Vec_3 | Node2Vec_4 | Node2Vec_5 | Node2Vec_6 | Node2Vec_7 | Node2Vec_8 | Node2Vec_9 | ... | Node2Vec_118 | Node2Vec_119 | Node2Vec_120 | Node2Vec_121 | Node2Vec_122 | Node2Vec_123 | Node2Vec_124 | Node2Vec_125 | Node2Vec_126 | Node2Vec_127 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(clone tec14) | 0.065830 | 0.015107 | -0.169474 | -0.017198 | 0.085214 | 0.066795 | 0.021665 | -0.033675 | 0.127271 | 0.086943 | ... | -0.139271 | -0.192168 | -0.024476 | -0.059446 | 0.030652 | 0.011218 | -0.095148 | -0.065400 | -0.069221 | 0.019122 |
100 kDa coactivator | -0.120776 | 0.333710 | -0.264869 | 0.258195 | -0.137268 | 0.223037 | 0.344388 | -0.058348 | 0.050687 | 0.190354 | ... | -0.447794 | -0.207273 | -0.021812 | 0.102079 | 0.379066 | 0.227051 | -0.299870 | 0.093029 | 0.420842 | -0.299310 |
14-3-3 tau splice variant | -0.207174 | 0.489269 | 0.060112 | 0.033272 | -0.582095 | 0.089639 | 0.183833 | -0.331528 | -0.033732 | -0.316844 | ... | -0.135543 | -0.491769 | 0.059879 | 0.572159 | -0.167333 | -0.774573 | -0.329807 | 0.241468 | -0.139246 | 0.181745 |
3'-phosphoadenosine-5'-phosphosulfate synthase | 0.073491 | 0.080999 | -0.028227 | 0.002335 | -0.069363 | 0.091756 | -0.091159 | -0.080245 | 0.067129 | 0.049245 | ... | -0.042785 | -0.081899 | -0.041130 | 0.025566 | 0.122074 | -0.021724 | -0.085229 | -0.029068 | -0.036015 | -0.100795 |
3-beta-hydroxysteroid dehydrogenase | 0.067097 | -0.061427 | 0.093204 | 0.108998 | -0.041609 | 0.058034 | 0.041132 | -0.040696 | 0.152901 | -0.081870 | ... | -0.233139 | -0.060815 | 0.187243 | 0.057241 | -0.081594 | 0.062716 | -0.078905 | -0.121561 | -0.014237 | 0.058866 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
pp10122 | 0.178080 | 0.371527 | -0.412717 | -0.246089 | 0.214775 | -0.506345 | -0.290790 | -0.150410 | 0.215877 | 0.408559 | ... | -0.709345 | -0.260173 | 0.380616 | -0.316627 | -0.490632 | 0.194519 | 0.108054 | -0.426641 | 0.036487 | -0.381604 |
tRNA-uridine aminocarboxypropyltransferase | -0.174120 | 0.032164 | -0.087492 | 0.100593 | -0.302876 | 0.200717 | 0.170120 | -0.217411 | -0.027259 | 0.027179 | ... | -0.102030 | -0.308913 | 0.217483 | -0.102495 | -0.275602 | -0.286097 | 0.077114 | -0.144031 | 0.070092 | -0.232707 |
tmp_locus_54 | 0.157015 | 0.218119 | -1.155761 | 0.492320 | 0.409154 | 0.175829 | -1.217417 | -0.528736 | -0.462023 | 0.198218 | ... | -0.510813 | 0.149230 | 0.147140 | 0.040833 | -0.103283 | -1.122915 | -0.044513 | -0.253034 | -0.038325 | -0.133388 |
urf-ret | 0.335659 | 0.228930 | 0.175542 | -0.229068 | 0.083526 | 0.178109 | 0.427678 | 0.007911 | 0.225716 | 0.223766 | ... | -0.747746 | -0.156841 | 0.326407 | 0.113307 | -0.329125 | -0.432075 | -0.565949 | 0.077020 | 0.458489 | -0.419929 |
zf30 | -0.410130 | -0.274361 | 0.290211 | -0.336239 | 0.221474 | -0.332876 | 0.159841 | -0.259432 | 0.078994 | 0.229157 | ... | -0.663979 | -0.332803 | 0.177944 | -0.310315 | -0.063604 | 0.098105 | 0.360965 | 0.330712 | 0.027433 | -0.002185 |
19334 rows × 128 columns