Experiment with Summarizing on quantitative data (real numbers).
Input
report = “”"
County A: Min 6.8, P10 is 12.1, P50 is 23.6, P90 is 36.79, Max 42.1.\
County B: Min 8.82, P10 is 11.7, P50 is 27.6, P90 is 42.53, Max 48.8. \
County C: Min 12.56, P10 is 16.8, P50 is 38.9, P90 45.21, Max 60.74. \
The area generates more heat than expected.
“”"
Prompt as follows:
prompt = f"“”
Range is defined as the difference between minimum and maximum.\
Your task is to generate a short summary of \
report focusing on the range and precentiles.
Summarize the report below, delimited by triple
backticks.
Review: {report}
“”"
response = get_completion(prompt)
print(response)
Result
The report presents data on three counties, A, B, and C, focusing on their minimum, maximum, and percentiles (P10, P50, and P90). County A has a range of 35.3 (from 6.8 to 42.1), County B has a range of 40.98 (from 8.82 to 48.8), and County C has a range of 48.18 (from 12.56 to 60.74). The percentiles show that County C has the highest temperatures, with a P90 of 45.21, while County A has the lowest temperatures, with a P10 of 12.1.
The report concludes that the area generates more heat than expected.
Observation - it looks good but the County B range value (as defined) is off by 1.0.
Q: is this a limitation of gpt-3.5-turbo or LLM when working with real numbers?
Much appreciated for clarification and comments. Pat
