Lesson 4 - last async cell DLAI behaviour vs local hosted

Hi experts,

Need help with the behaviour difference of the very last cell for
async streaming event between running it in DLAI env and locally hosted
models (see further below) via Ollama .

In that very last cell for async events streaming of the notebook:

new_memory = MemorySaver()   # see comment further below
abot = Agent(model, [tool], system=prompt, checkpointer=new_memory)

messages = [HumanMessage(content="What is the weather in SF?")]
thread = {"configurable": {"thread_id": "4"}}
event_count = 0

async for event in abot.graph.astream_events({"messages": messages}, thread, version="v1"):
    event_count += 1
    kind = event["event"]
    
    print( f"event kind: {kind}" )
    # print( f"event: {event}\n\n\n" )
    
    if kind == "on_chat_model_stream":
        content = event["data"]["chunk"].content
        if content:
            # Empty content in the context of OpenAI means
            # that the model is asking for a tool to be invoked.
            # So we only print non-empty content
            print(content, end="|")
            
print( f"Total event count: {event_count}" )

The series of “kind” between the 2 env observed are very different that for
local run, “kind” (i.e event[“event”]) never has value of “on_chat_model_stream”. Consequently, don’t see expected output as streamed token:

event kind: on_chain_start
event kind: on_chain_start
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_end
event kind: on_chain_start
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_start
Calling: {'name': 'tavily_search_results_json', 'args': {'query': 'San Francisco weather'}, 'id': 'call_hn85hxk2', 'type': 'tool_call'}
Back to the model!
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_start
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_end
Total event count: 31

The run on DLAI looks like:

event kind: on_chain_start
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_start
event kind: on_chat_model_start
event kind: on_chat_model_stream
event kind: on_chat_model_stream
event kind: on_chat_model_stream
event kind: on_chat_model_stream
event kind: on_chat_model_stream
event kind: on_chat_model_stream
event kind: on_chat_model_stream
event kind: on_chat_model_stream
event kind: on_chat_model_stream
event kind: on_chat_model_stream
event kind: on_chat_model_stream
event kind: on_chat_model_end
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_start
Calling: {'name': 'tavily_search_results_json', 'args': {'query': 'current weather in San Francisco'}, 'id': 'call_CcgdAGKqbVqX9larg7FLWwND'}
event kind: on_tool_start
Back to the model!
event kind: on_tool_end
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_start
event kind: on_chat_model_start
event kind: on_chat_model_stream
event kind: on_chat_model_stream
The|event kind: on_chat_model_stream
 current|event kind: on_chat_model_stream
 weather|event kind: on_chat_model_stream
 in|event kind: on_chat_model_stream
 San|event kind: on_chat_model_stream
 Francisco|event kind: on_chat_model_stream
 is|event kind: on_chat_model_stream
:

|event kind: on_chat_model_stream
-|event kind: on_chat_model_stream
 **|event kind: on_chat_model_stream
Temperature|event kind: on_chat_model_stream
**|event kind: on_chat_model_stream
:|event kind: on_chat_model_stream
 |event kind: on_chat_model_stream
18|event kind: on_chat_model_stream
.|event kind: on_chat_model_stream
2|event kind: on_chat_model_stream
°C|event kind: on_chat_model_stream
 (|event kind: on_chat_model_stream
64|event kind: on_chat_model_stream
.|event kind: on_chat_model_stream
8|event kind: on_chat_model_stream
°F|event kind: on_chat_model_stream
)
|event kind: on_chat_model_stream
-|event kind: on_chat_model_stream
 **|event kind: on_chat_model_stream
Condition|event kind: on_chat_model_stream
**|event kind: on_chat_model_stream
:|event kind: on_chat_model_stream
 Mist|event kind: on_chat_model_stream

|event kind: on_chat_model_stream
-|event kind: on_chat_model_stream
 **|event kind: on_chat_model_stream
Wind|event kind: on_chat_model_stream
**|event kind: on_chat_model_stream
:|event kind: on_chat_model_stream
 |event kind: on_chat_model_stream
6|event kind: on_chat_model_stream
.|event kind: on_chat_model_stream
0|event kind: on_chat_model_stream
 mph|event kind: on_chat_model_stream
 (|event kind: on_chat_model_stream
9|event kind: on_chat_model_stream
.|event kind: on_chat_model_stream
7|event kind: on_chat_model_stream
 k|event kind: on_chat_model_stream
ph|event kind: on_chat_model_stream
)|event kind: on_chat_model_stream
 from|event kind: on_chat_model_stream
 the|event kind: on_chat_model_stream
 west|event kind: on_chat_model_stream

|event kind: on_chat_model_stream
-|event kind: on_chat_model_stream
 **|event kind: on_chat_model_stream
Humidity|event kind: on_chat_model_stream
**|event kind: on_chat_model_stream
:|event kind: on_chat_model_stream
 |event kind: on_chat_model_stream
77|event kind: on_chat_model_stream
%
|event kind: on_chat_model_stream
-|event kind: on_chat_model_stream
 **|event kind: on_chat_model_stream
Visibility|event kind: on_chat_model_stream
**|event kind: on_chat_model_stream
:|event kind: on_chat_model_stream
 |event kind: on_chat_model_stream
9|event kind: on_chat_model_stream
.|event kind: on_chat_model_stream
7|event kind: on_chat_model_stream
 km|event kind: on_chat_model_stream
 (|event kind: on_chat_model_stream
6|event kind: on_chat_model_stream
.|event kind: on_chat_model_stream
0|event kind: on_chat_model_stream
 miles|event kind: on_chat_model_stream
)
|event kind: on_chat_model_stream
-|event kind: on_chat_model_stream
 **|event kind: on_chat_model_stream
UV|event kind: on_chat_model_stream
 Index|event kind: on_chat_model_stream
**|event kind: on_chat_model_stream
:|event kind: on_chat_model_stream
 |event kind: on_chat_model_stream
0|event kind: on_chat_model_stream
.|event kind: on_chat_model_stream
5|event kind: on_chat_model_stream


|event kind: on_chat_model_stream
The|event kind: on_chat_model_stream
 skies|event kind: on_chat_model_stream
 are|event kind: on_chat_model_stream
 mostly|event kind: on_chat_model_stream
 clear|event kind: on_chat_model_stream
 with|event kind: on_chat_model_stream
 mist|event kind: on_chat_model_stream
,|event kind: on_chat_model_stream
 and|event kind: on_chat_model_stream
 there|event kind: on_chat_model_stream
 is|event kind: on_chat_model_stream
 no|event kind: on_chat_model_stream
 precipitation|event kind: on_chat_model_stream
.|event kind: on_chat_model_stream
event kind: on_chat_model_end
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_start
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_end
event kind: on_chain_stream
event kind: on_chain_end
Total event count: 148

The local run is set up as following:

_OLLAMA_ = True
if _OLLAMA_:
    _MODEL_NAME_ = "llama3.2:3b"   # among few others tested
    _LLM_SERVER_ = "http://localhost:11434"
    model = ChatOpenAI( model=_MODEL_NAME_, base_url=f'{_LLM_SERVER_}/v1/', api_key='ollama', temperature=0 )
else:
    # model = ChatOpenAI(model="gpt-3.5-turbo")  #reduce inference cost
    model = ChatOpenAI(model="gpt-4o")

Also, due to the known issue with Lesson 4 checkpointer which has been
discussed in other threads for the course, I am just using a very simple
workaround (suggested in other threads) as following:

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()

For comparison purpose, have also modified the DLAI notebook to use
MemorySaver( ) in another iteration of run in the DLAI env and it is
streaming the expected output similar as with original async checkpointer.

Nothing else is changed in the notebook for local run. The net result being
that I don’t get any async streamed event of the actual response as there
is no event matching “on_chat_model_stream” type. Have also tried other
models with tool support (qwen2.5, mistral, llama3.2:3b, phi4-mini) but to
no avail.

Seems to be local env relate. Any idea how I can fix the local run
behaviour so that I get the expected output as per running in DLAI env?

Thanks,
MCW