Determining When to Utilize External Search or Rely on LLM's Intrinsic Knowledge

How can an LLM determine when to use its own knowledge versus conducting an external search (e.g., browsing Wikipedia by a tool)?

Would allowing the LLM to assess its certainty in an answer through token probabilities be an effective method? Does anyone have any experience with this?