Environment for AI4M Course 2 Week 4 Assignment

Can you share the packages necessary to recreate the environment used for AI4M Course 2 Week 4 Assignment?

E.g. What version of R, Python, etc. is being used?

I tried downloading r-ggplot2, r-base, etc. to recreate the environment, and I can make it through most of the coding assignment without errors, but I cannot seem to get the program to recognize randomForestSRC. Please see below. Thank you so much in advance for your help!


Input:
%load_ext rpy2.ipython
%R require(ggplot2)

from rpy2.robjects.packages import importr

import R’s “base” package

base = importr(‘base’)
print(base._libPaths())

import R’s “utils” package

utils = importr(‘utils’)

import rpy2’s package module

import rpy2.robjects.packages as rpackages

forest = rpackages.importr(‘randomForestSRC’, lib_loc=‘R’) # Original

forest = rpackages.importr(‘randomForestSRC’, lib_loc="/opt/anaconda3/envs/Python_3_7/lib/R/library")

from rpy2 import robjects as ro
R = ro.r

from rpy2.robjects import pandas2ri
pandas2ri.activate()


Output:


RRuntimeError Traceback (most recent call last)
/var/folders/zt/gkxs3_z54sq5jvzkdb97g21m0000gn/T/ipykernel_34474/3496622856.py in
13 import rpy2.robjects.packages as rpackages
14
—> 15 forest = rpackages.importr(‘randomForestSRC’, lib_loc=‘R’) # Original
16 # forest = rpackages.importr(‘randomForestSRC’, lib_loc="/opt/anaconda3/envs/Python_3_7/lib/R/library")
17

/opt/anaconda3/envs/Python_3_7/lib/python3.7/site-packages/rpy2/robjects/packages.py in importr(name, lib_loc, robject_translations, signature_translation, suppress_messages, on_conflict, symbol_r2python, symbol_check_after, data)
451 if _package_has_namespace(rname,
452 _system_file(package = rname)):
→ 453 env = _get_namespace(rname)
454 version = _get_namespace_version(rname)[0]
455 exported_names = set(_get_namespace_exports(rname))

RRuntimeError: Error in library.dynam(lib, package, package.lib) :
shared object ‘randomForestSRC.dylib’ not found

You can insert a code cell to your notebook and run this

%pip list

to find out what the runtime env is, then use that list to build a matching env locally

Thank you! This helped fix several small bugs in the code.

I am only receiving an error message about randomForestSRC now.

I have recreated the entire “R” folder and all of its contents as seen in the folder structure with C2M4_Assignment.ipynb on Coursera on my local machine, but on my local machine, Python is unable to recognize/find the file necessary to run randomForestSRC.

Do you have a recommendation where the randomForestSRC folder/file needs to be saved for Python to recognize the function on my local machine (If you believe this is even the main issue)? Perhaps lib_loc= in my code needs to be changed to something other than “R” as seen in the assignment?

Hopefully, the images below are useful to you:



Thank you again for your help.

https://pypi.org/project/random-survival-forest/

Took that course a long time ago and away from my computer now. Are you running rfSRC in python or R ?

I will try to $ pip install random-survival-forest (as per your link, thank you) and see if I can adapt the code to match its arguments.

I am using Jupyter Notebook (Python 3.7) to run all code posted above. rfSRC seems to be an R package. Just not sure where the randomForestSRC package needs to be saved on my local machine and how its location can be retrieved in the python code.

my experience from 2 years ago is in a post on the old Coursera forums, but looks like I can’t link directly to it from here. Below is the code. I don’t think it violates the honor code, since this part is boilerplate. Ignore the DM I sent before, this is better.

from rpy2.robjects.packages import importr

# import rpy2's package module
import rpy2.robjects.packages as rpackages

utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1)
package_names = ('base','ggplot2','randomForestSRC')
from rpy2.robjects.vectors import StrVector
utils.install_packages(StrVector(package_names))

from rpy2 import robjects as ro
R = ro.r

from rpy2.robjects import pandas2ri
pandas2ri.activate()

forest = rpackages.importr('randomForestSRC')
model = forest.rfsrc(ro.Formula('Surv(time, status) ~ .'), data=df_train, ntree=300, nodedepth=5, seed=-1)

Here’s what I wrote in the forum at the time…

I had some challenges in section 8. Random Survival Forests with the R package that I did work around. I’ve never used R from Python before, and didn’t have it installed in the VM I used for machine learning. This line failed for me, as it couldn’t find the package: forest = rpackages.importr(‘randomForestSRC’, lib_loc=‘R’)

This is what I replaced [that entire block] with, which did run and produce matching expected results. Not saying it is the only way, or even the best way, and it seems to invoke the C compiler and build the package every time it runs, but it does run. Optimization is phase n+1.

Thank you for your response.

The new boilerplate code seems to be generating the same error as before.
Please see the image below.

Is it possible you also downloaded randomForestSRC yourself from CRAN perhaps? And saved the package somewhere yourself before running your code? And/or did you already have R independently installed on your machine? Just trying to offer ideas.

Best,
Stefan

Pretty sure randomForest was downloaded and compiled on the fly every time I ran that cell. Definitely didn’t have rpy previously. Don’t know if I had R installed locally or not, but likely not since I was using space constrained VMWare virtual machines at the time and tried to not have anything extraneous installed.

There are some discussions about this rpy package not found error and workarounds that are lib path related. Here, for example