In the example with the LLM, where you are calculating the proportion of users would rate the app 5 starts, shouldn’t you be looking at all users who downloaded the app, not just the ones who rated the app?
My interpretation of the results demoed is that, of the users who rated the app, we can say with 95% confidence that between 76% and 91.8% will give a five-star rating. But that’s different than the proportion of all users who will give a five-star rating.
It’s true that choosing the population of users who rate the app is not necessarily representative of all users who downloaded the app, but generalizing to all users would need a different study design or extra assumptions about the rating behavior of non-respondents. Here, the objective is to estimate the five-star proportion among raters, the inference is fine. If you try to generalize to all users without evidence that raters and non-raters behave similarly would risk introducing bias, so the cautious and statistically valid approach is to stick to the sample frame of raters.
Hope it helps! Feel free to ask if you need further assistance.