1. Install HELP from GitHub (and Karateclub)

Skip this cell if you already have installed HELP and Karateclub.

!pip install git+https://github.com/giordamaug/HELP.git
!pip install -q karateclub

2. Download the PPI input file

The file describing the PPI network for the Kidney tissue can be found on GitHub.

!wget https://raw.githubusercontent.com/giordamaug/HELP/main/data/Kidney_PPI.csv

3. Load the PPI input file

The PPI network is given as a csv file based on the data downloaded from the Integrated Interaction Database (IID), where each row represents one of the edges, and the three columns represent: + Column A: gene name of the edge source; + Column B: gene name of the edge target; + Column combined_score: attribute providing the edge weight, which can be 1 if supported by one piece of evidence, 2 if supported by two or 3 if supported by three.

import pandas as pd
from HELPpy.preprocess.embedding import PPI_embed
df_net = pd.read_csv('Kidney_PPI.csv')

4. Compute the embedding

Compute the result using a graph embedding method (here method="Node2Vec") to produce embedding vectors of default length (128).

df_embed = PPI_embed(df_net, method="Node2Vec", verbose=True)

Please be aware that this will take almost 2 hours in sequential execution.

5. Save and show the embedding

The embedding result is a DataFrame with rows the gene names and columns the components of the 128-sized vector representing each node/gene in the PPI.

df_embed.to_csv('Kidney_EmbN2V_128.csv')
df_embed
Node2Vec_0 Node2Vec_1 Node2Vec_2 Node2Vec_3 Node2Vec_4 Node2Vec_5 Node2Vec_6 Node2Vec_7 Node2Vec_8 Node2Vec_9 ... Node2Vec_118 Node2Vec_119 Node2Vec_120 Node2Vec_121 Node2Vec_122 Node2Vec_123 Node2Vec_124 Node2Vec_125 Node2Vec_126 Node2Vec_127
(clone tec14) 0.065830 0.015107 -0.169474 -0.017198 0.085214 0.066795 0.021665 -0.033675 0.127271 0.086943 ... -0.139271 -0.192168 -0.024476 -0.059446 0.030652 0.011218 -0.095148 -0.065400 -0.069221 0.019122
100 kDa coactivator -0.120776 0.333710 -0.264869 0.258195 -0.137268 0.223037 0.344388 -0.058348 0.050687 0.190354 ... -0.447794 -0.207273 -0.021812 0.102079 0.379066 0.227051 -0.299870 0.093029 0.420842 -0.299310
14-3-3 tau splice variant -0.207174 0.489269 0.060112 0.033272 -0.582095 0.089639 0.183833 -0.331528 -0.033732 -0.316844 ... -0.135543 -0.491769 0.059879 0.572159 -0.167333 -0.774573 -0.329807 0.241468 -0.139246 0.181745
3'-phosphoadenosine-5'-phosphosulfate synthase 0.073491 0.080999 -0.028227 0.002335 -0.069363 0.091756 -0.091159 -0.080245 0.067129 0.049245 ... -0.042785 -0.081899 -0.041130 0.025566 0.122074 -0.021724 -0.085229 -0.029068 -0.036015 -0.100795
3-beta-hydroxysteroid dehydrogenase 0.067097 -0.061427 0.093204 0.108998 -0.041609 0.058034 0.041132 -0.040696 0.152901 -0.081870 ... -0.233139 -0.060815 0.187243 0.057241 -0.081594 0.062716 -0.078905 -0.121561 -0.014237 0.058866
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
pp10122 0.178080 0.371527 -0.412717 -0.246089 0.214775 -0.506345 -0.290790 -0.150410 0.215877 0.408559 ... -0.709345 -0.260173 0.380616 -0.316627 -0.490632 0.194519 0.108054 -0.426641 0.036487 -0.381604
tRNA-uridine aminocarboxypropyltransferase -0.174120 0.032164 -0.087492 0.100593 -0.302876 0.200717 0.170120 -0.217411 -0.027259 0.027179 ... -0.102030 -0.308913 0.217483 -0.102495 -0.275602 -0.286097 0.077114 -0.144031 0.070092 -0.232707
tmp_locus_54 0.157015 0.218119 -1.155761 0.492320 0.409154 0.175829 -1.217417 -0.528736 -0.462023 0.198218 ... -0.510813 0.149230 0.147140 0.040833 -0.103283 -1.122915 -0.044513 -0.253034 -0.038325 -0.133388
urf-ret 0.335659 0.228930 0.175542 -0.229068 0.083526 0.178109 0.427678 0.007911 0.225716 0.223766 ... -0.747746 -0.156841 0.326407 0.113307 -0.329125 -0.432075 -0.565949 0.077020 0.458489 -0.419929
zf30 -0.410130 -0.274361 0.290211 -0.336239 0.221474 -0.332876 0.159841 -0.259432 0.078994 0.229157 ... -0.663979 -0.332803 0.177944 -0.310315 -0.063604 0.098105 0.360965 0.330712 0.027433 -0.002185

19334 rows × 128 columns