Dimensionality Reduction

Dimensionality reduction aims to simplify datasets by reducing feature numbers while preserving essential information. The umap function in Graphistry offers this capability, combined with automated feature engineering. It translates complex datasets to a lower-dimensional plane, maintaining the data's original structure. Such transformations are vital for visual representation, data analysis, and cluster analysis. Additionally, it addresses the 'curse of dimensionality', preventing challenges like overfitting. More details can be found in the umap-learn documentation.

UMAP Integration

The Unified Manifold Approximation and Projection (UMAP) is implicitly called when you utilize Graphistry's feature engineering tools. Here's an example:

g2 = graphistry.nodes(df).featurize()

This emits a new g object with new properties g._node_features and potentially _edge_features. It can be typically used with parameters like X= and optionally y=. There's also an option for feature_engine=. The UMAP can also be directly called using:

g.umap()

Using UMAP for Dimensionality Reduction and Visualization:


import graphistry   
g = graphistry.nodes(pd.DataFrame({'node': [0,1,2], 'data': [1,2,3], 'meta': ['a', 'b', 'c']}))
g2 = g.umap(n_components=3, spread=1.0, min_dist=0.1, n_neighbors=12, negative_sample_rate=5, local_connectivity=1, repulsion_strength=1.0, metric='euclidean', suffix='', play=0, encode_position=True, encode_weight=True, dbscan=False, engine='auto', feature_engine='auto', inplace=False, memoize=True, verbose=False)
g2.plot()

Customizable Parameters

Users can also fine-tune UMAP parameters:

  • UMAP: n_components, metric, n_neighbors, min_dist, and others.