Mauricio, I tried your idea on some 3D data and it kind of works. But there is a high chance, the 2D projection for higher-order data is not appropriate to really find the similarities you are looking for. Simply too many points will likely overlap.

My suggestion: use interactive 3D or even interactive 3D and time (as a 4th dimensions) to keep as much information in dimensions. Also use DBSCAN (is quick, identifies outliers as well, no need for number of clusters) to identify clusters based on two simple, domain-specific parameters (eps, and min_points).

Reason: essentially, what you are looking for are points close to each other. And this is: clusters. So clustering your data would be an essential step in whatver visalization you use, whether 2D or 3D or 3D and time (movie-like). Then colors will express the mathematical closeness of points according to whatever distance measure you define.

Here is some sample code in a toy example:

```
import numpy as np
import plotly.graph_objects as go
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
def generate_points_on_torus(num_points, major_radius, minor_radius):
# Generate random toroidal coordinates
theta = 2 * np.pi * np.random.rand(num_points)
phi = 2 * np.pi * np.random.rand(num_points)
# Calculate torus coordinates
x = (major_radius + minor_radius * np.cos(phi)) * np.cos(theta)
y = (major_radius + minor_radius * np.cos(phi)) * np.sin(theta)
z = minor_radius * np.sin(phi)
# Create a numpy array with the Cartesian coordinates
points_on_torus = np.column_stack((x, y, z))
return points_on_torus
def visualize_points_on_surface(points, title):
fig = go.Figure()
unique_labels = np.unique(points[:, 3])
colors = ['rgb({}, {}, {})'.format(np.random.randint(0, 255), np.random.randint(0, 255), np.random.randint(0, 255)) for _ in range(len(unique_labels))]
for label, color in zip(unique_labels, colors):
cluster_points = points[points[:, 3] == label]
trace = go.Scatter3d(
x=cluster_points[:, 0],
y=cluster_points[:, 1],
z=cluster_points[:, 2],
mode='markers',
marker=dict(size=5, color=color),
name=f'Cluster {int(label)}'
)
fig.add_trace(trace)
fig.update_layout(scene=dict(aspectmode='data'))
fig.update_layout(title=title)
fig.show()
# Set the number of random points
num_points_torus = 700
# Generate random points on the surface of a torus (donut-style)
torus_major_radius = 5 # Major radius
torus_minor_radius = 2 # Minor radius (tube radius)
torus_points = generate_points_on_torus(num_points_torus, torus_major_radius, torus_minor_radius)
# Standardize the data before applying DBSCAN
scaler = StandardScaler()
torus_points_scaled = scaler.fit_transform(torus_points[:, :3])
# Perform DBSCAN clustering (Watch out! This is on scaled down data!)
epsilon = 0.25
min_pts = 4
dbscan = DBSCAN(eps=epsilon, min_samples=min_pts)
labels = dbscan.fit_predict(torus_points_scaled)
# Visualize the points on the torus after clustering
torus_points_with_labels = np.column_stack((torus_points, labels))
visualize_points_on_surface(torus_points_with_labels, 'Points on the Surface of a Torus (After Clustering)')
```

While your idea would be something like this:

```
# initialize data set
A=torus_points
# Calculate a pairwise matrix of cosine similarities among the vectors
from sklearn.metrics.pairwise import cosine_similarity
A_cos_sim = cosine_similarity (A, A)
# Apply PCA of 2 dimensions to the cosine similarity matrix
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
pca.fit(A_cos_sim)
print(pca.explained_variance_ratio_)
A_res = pca.fit_transform(A_cos_sim)
# Plot the 2 PCA dimensions using scatter
import matplotlib.pyplot as plt
plt.scatter( A_res[:, 0], A_res[:, 1])
plt.show()
```

Naturally, one of the most important aspects is to look at Proportion of Variance explained when reducing dimensions to 2 in the PCA. If 2 principal components only catch a small share of the total variance, then you need more dimensions.