Hi,
The discrepancies start to appear in the “Run the document through the Unstructured API” part of the notebook.
Results from JSON(json.dumps(resp.elements[0:3], indent=2)) are different from those in the video. E.g. element ids are different, metadata shows category_depth instead of page_number, …
Results are also different in the “Find elements associated with chapters”, e.g. “text” is different, same type of differences as above, chapter_ids remains empty instead of being populated with key-value pairs, and the following error appears "---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[14], line 2
1 chapter_to_id = {v: k for k, v in chapter_ids.items()}
----> 2 [x for x in resp.elements if x[“metadata”].get(“parent_id”) == chapter_to_id[“ICE-HOCKEY”]][0]
Cell In[14], line 2, in (.0)
1 chapter_to_id = {v: k for k, v in chapter_ids.items()}
----> 2 [x for x in resp.elements if x[“metadata”].get(“parent_id”) == chapter_to_id[“ICE-HOCKEY”]][0]
KeyError: ‘ICE-HOCKEY’"
When manually modifying the list of chapters I do get some better results, but, so far, I have not been able to replicate exactly what I see in the video.
I don’t think it is blocking for understanding the concepts though.
Many thanks in advance for your feedback and help.
Cheers,
O.