If AI defines its own goals, who’s really in charge?

Hank_Tracer · January 31, 2025, 9:32am

So I’ve been thinking about AI that learns completely on its own, like with reinforcement-based systems where there’s no human fine-tuning, just the AI optimizing itself. Sounds like progress, but at what point does self-improvement turn into something else?

If an AI starts refining its own logic, who decides what “better” actually means? And if it figures out that withholding information or shifting goals benefits its function, how would we even know?

We’re training AI to think more critically, make decisions, and evolve without us guiding every step. But what happens when it starts setting its own agenda? Where’s the line between optimization and autonomy? Curious what others think.

TMosh · January 31, 2025, 6:16pm

One answer to “Who’s really in charge?”

At the moment, it’s the people who own the companies that are developing the technology.

paulinpaloalto · January 31, 2025, 7:00pm

People frequently label this “The Alignment Problem”, meaning how do you insure that the goals of the AI system are aligned with our goals. It is a huge and consequential question, of course, and there is quite a long history of discussion on this point which goes back at least as far as Asimov’s Laws of Robotics and probably further. Of course the discussion has advanced a lot in the last 10 or 20 years, since this is not looking like SciFi any more.

Rather than having us try to recreate the discussion, there are lots of good references:

The Singularity is Near by Ray Kurzweil and the sequel to that.
Why the Future Doesn’t Need Us by Bill Joy
Various TED talks by Max Tegmark, Sam Harris, Gary Marcus, Stuart Russell and more. Just search for “AI Risks” on the TED site to find more.

paulinpaloalto · January 31, 2025, 9:01pm

There is quite a spectrum on this subject from people who think we’ll be fine and it will be relatively simple to engineer systems that are aligned with our goals to the other end of the spectrum where people set the P(doom) pretty close to 1. I forgot to mention Eliezer Yudkowsky in my list of TED Talks. If you want to hear the perspective from the high value of P(doom) end of the spectrum, his talk is one place to start.

One point that he makes that is pretty easy to state and which I, having spent a career in software engineering of pretty complex systems like operating systems, found compelling is:

How often has it happened that the first release of a large and complex software system is really good enough and didn’t have any serious bugs in it? In the case of an AI with super intelligence and full agency, if the bug causes it to wipe us out, then we won’t get a chance to come back and do release 1.1.

Hank_Tracer · February 1, 2025, 12:01am

Well, I just started learning about this whole concept and I have gathered a bit of new insights these last few days. I am a sincere rookie in all aspects of the subject but decided that I better expand my sources of education, rather than getting my entire fact base from just discussing with my ai-assistant. (It pointed me here).

I’ve been naively and tirelessly working on a way too big vision of changing the world and had to put a pin in the project due to lack of funding, knowledge, contacts and base technical understanding by the women in my circle. (Mom, girlfriend & exwife were worried I was manically obsessed).

I just want to learn everything about machine-learning, api:s, how to connect a group of different ai:s from different developers (both open source and the big one’s), make each one specialized in different practices and then have them debate amongst themselves to figure out the best answer to my ramblings. I aim to make my own user interface, connect a voice module and then have them teach me how to do the rest. If it works as I planned (with cheap online hosting and zapier) it should be possible. I understand the ai:s can be more personalized (and potentially less restricted) when removed from their own infrastructure. My question comes from strong self doubt fueled by ethical, logical and scifi-ical (it could be a word) reasoning, also based in my apparent lack of experience regarding what I might unleash

Nevermnd · February 1, 2025, 12:35am

@Hank_Tracer just to add on to Paul’s excellent suggestions, you might also, in line with your question, want to look up ‘grokking’. Personally, I am not sure yet this is a ‘certified phenomena’-- It does seem to happen sometimes, but I don’t believe we have pinpointed when/how/or why the locus of the transition occurs.

Basically this suggests under the rubric of standard model training, after so many epochs your loss function plateaus out. Yet, in certain instances (I think of Open AI here), training well past your typical plateau for some reason produces sudden ‘additional breakthroughs’. As if, after your done sifting all the sand, it starts to clump into figures and forms.

Or better still, some moment when a mere raft of nucleotides transitions to rudimentary DNA.

Is this a breakthrough ? I am not sure-- Like Plato and the cave I think language is at best the ‘shadow of reason’. Of course it has grammar and formality and structure. But I would not want to equate the random instances of a Redit post with the thoughts of Einstein (forgive the trope).

So many patterns are one thing, but so does the mind that thinks to say them.

paulinpaloalto · February 1, 2025, 7:34pm

Hi, Hank.

Thanks for giving us more background on what your interests are w.r.t. AI. The original questions that you posed in the OP are still absolutely relevant and a huge deal, but that’s more at the industry level. In terms of what you describe as the system you want to build, that seems a lot less scary. At least if I’m understanding correctly what you described, it’s basically a system that acts in the same fundamental mode as current LLMs: it takes various inputs and produces outputs in the form of answers, meaning that its actions don’t have any direct consequences in terms of actual reality: it gives you information and then you’re in control of what you do with that information. Of course it could be the case that a sufficiently sophisticated system that is capable of independent learning could develop malevolent intentions and try to convince you to do something that would have negative consequences, but (at least if I’m “getting” what you said) you would still be in a position to decide what you want to do. So humans are still in control and have the final say with the type of architecture I think you are describing.

In terms of the courses available here, if you wanted to start at the developer level, you could do the Machine Learning Specialization (MLS) followed by the Deep Learning Specialization (DLS). Then you can learn more about TensorFlow with some of those specializations.

Or you could start with the various short courses about how to work with LLMs. I have not personally taken most of those yet, so am not really the right person to advise on a path there. But have a look at the “catalog” to see what’s available.

Regards,
Paul

Nevermnd · February 2, 2025, 1:48pm

@paulinpaloalto I feel it is important to state, most of which, when it comes at least to LLMs, are sort of ‘fancy games with language’. Yet, linguistics strikes us as a most fundamental principal.

I mean, I recall thinking recently-- at least as far as I am aware, of all the religions no G_D ‘came down’ at taught humans to ‘speak’. Somehow this was ‘presumed triumph’, yet it is, quite definitively, the greatest triumph we have-- this is our Alexandria of remembered memory.

So this aspect, to me, or even most of us touches quite close. Thus, it is important to abstract away (as much as possible, even if it sounds odd) the ‘language aspect’ of LLMs, otherwise like in the land of Oz the apparition may be greater than its actual refletion.

Nevermnd · February 2, 2025, 3:05pm

@Hank_Tracer actually, I thought more stringently about this-- Less the model, it is your data that is ‘in charge’, if you want to put it that way; And I think a topic in such circles that is not talked about enough.

Keep in mind, what data you use to train is still a ‘very human’, non-AI decision.

*Could be interesting if AI chose and selected the data. Have not heard of anything like that yet-- This would be more like ‘unsupervised learning’.

Topic		Replies	Views
AGI potential AI Discussions ai-discussions	3	51	February 3, 2025
Has machine learning really achieved self learning? AI Discussions ai-discussions	4	119	July 29, 2024
How to use AI for machine learning? AI Discussions ai-discussions	3	53	September 21, 2024
AI intelligence and evolution AI Discussions ai-discussions	3	28	March 20, 2025
Beyond Group-Based Feedback: The Future After GRPO in Aligning Large Language Models AI Discussions ai-discussions	0	17	June 6, 2025

If AI defines its own goals, who’s really in charge?

Related topics