Tip to help: umap

Hi all,

Just wanted to pass along a tip in relation to the second video as it stumped me for a bit when trying to use umap.

To use umap, you need to install ‘umap-learn’ not ‘umap’. So, in case you installed umap, run the following commands to uninstall umap and install umap-learn instead:

pip uninstall umap
pip install umap-learn

And then in your python code make sure you are importing the module using:

import umap.umap_ as umap

Instead of:

import umap

Hope this helps!

1 Like

Thank you.
Expand on this, since this umap on on cpu is only about 2 it/s. I tried to install the GPU version, UMAP (Installation Guide - RAPIDS Docs):

pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu12==23.12.* dask-cudf-cu12==23.12.* cuml-cu12==23.12.* \
    cugraph-cu12==23.12.* cuspatial-cu12==23.12.* cuproj-cu12==23.12.* \
    cuxfilter-cu12==23.12.* cucim-cu12==23.12.* pylibraft-cu12==23.12.* \

Then, the function project_embeddings becomes:

import cuml

def project_embeddings_gpu(embeddings, umap_transform):
    umap_embeddings = np.empty((len(embeddings),2))
    for i, embedding in enumerate(tqdm(embeddings)):
        np_embedding = np.array(embedding).reshape(1,-1)
        umap_embeddings[i, :] = umap_transform.transform(np_embedding)
    return umap_embeddings

umap_transform = cuml.UMAP(random_state=0, init="spectral").fit(np.array(embeddings))
projected_dataset_embeddings = project_embeddings_gpu(embeddings, umap_transform)

This version faster appr. 65 times, about 130 it/s on my laptop. I have made sure all the settings of both versions are similar, however, the results of the embed space are different. The GPU version produces more compact clusters, which leads to projections of queries and retrieved documents are closer or too far away from the projected data clusters. This behavior make it harder to show the points of the lesson. Tried to playing around with the parameters of UMAP, but I have not got any luck so far.

In conclusion, GPU version is faster about 65 times but destroy the points of the instructors wanted to convey. The settings of GPU version is needed to search until a “reasonable” embed spaces are achieved.

That’s an amazing boost! It’s been running at less than 2 it/s on my laptop personally.

This is certainly a new area for me, but is there any way to bring the GPU results more in line after your code has run? Through post-processing or calibration perhaps?

With that significant speed boost, it just seems to be a promising idea to explore…

Yes, I am trying to do so. Interestingly, when using the the GPU version, without for-loop, the results are “similar” to the CPU version. However, the instructor noted that he wants to use for-loop on purposes. I have not get the idea of this part yet.

Hi folks. I’m having trouble while trying to reproduce on my setup.

System Info:

ProductName: macOS
ProductVersion: 14.0
BuildVersion: 23A344
Intel(R) Core™ i9-9880H CPU @ 2.30GHz
hw.memsize: 17179869184

My kernel is crashing when I try to use UMAP:

umap_transform = umap.UMAP(random_state=0, transform_seed=0).fit(embeddings)


Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click here for more info. View Jupyter log for further details.

21:22:54.397 [info] Dispose Kernel process 1366.
21:22:54.397 [error] Raw kernel process exited code: undefined
21:22:54.399 [error] Error in waiting for cell to complete Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (~/.vscode/extensions/ms-toolsai.jupyter-2023.4.1011241018-darwin-x64/out/extension.node.js:2:32419)
at ~/.vscode/extensions/ms-toolsai.jupyter-2023.4.1011241018-darwin-x64/out/extension.node.js:2:51471
at Map.forEach ()
at v._clearKernelState (~/.vscode/extensions/ms-toolsai.jupyter-2023.4.1011241018-darwin-x64/out/extension.node.js:2:51456)
at v.dispose (~/.vscode/extensions/ms-toolsai.jupyter-2023.4.1011241018-darwin-x64/out/extension.node.js:2:44938)
at ~/.vscode/extensions/ms-toolsai.jupyter-2023.4.1011241018-darwin-x64/out/extension.node.js:24:105531
at te (~/.vscode/extensions/ms-toolsai.jupyter-2023.4.1011241018-darwin-x64/out/extension.node.js:2:1587099)
at Zg.dispose (~/.vscode/extensions/ms-toolsai.jupyter-2023.4.1011241018-darwin-x64/out/extension.node.js:24:105507)
at nv.dispose (~/.vscode/extensions/ms-toolsai.jupyter-2023.4.1011241018-darwin-x64/out/extension.node.js:24:112790)
at processTicksAndRejections (node:internal/process/task_queues:96:5)