AI help with project code generation

I am working with an example data set and building code for a Machine learning project. I tried to get some help with code generation from ChatGPT. Is there a better AI tool to help with building project code. Thank you.

2 Likes

First question: Have you attended any Machine Learning courses?

Tom,
Yes sir. I have taken the Two of Dr. Ng’s machine learning courses.
Sincerely,
Steve S,

Then you should not need to use Chat GPT for coding help.

Unless the data set or model is extremely complicated, you should be able to adapt one of the labs you have already worked through.

Can you add some information about the dataset you are using, and what sort of model you need?

Tom,
Thank you for the advice and confidence.
I am using this data set: https://archive.ics.uci.edu/dataset/320/student+performance from the UCI Machine Learning Repository site.

I don’t think it is a complicated model. I think I will be using a linear regression model but there are some categorical variables. Could use some help to get me started.
Thank you.
Steve S.

Have you attended any course that uses Pandas? It’s very handy for pre-processing the dataset to get it ready for use (converts categorical variables to one-hot form, etc).

@Steve_Senteio if you want to go ‘above and beyond’ a bit I can see how this dataset would be ripe for a little feature engineering. Also, depending on what library/tool/model you end up working with you might find it useful converting your categorical variables (for ex. ‘sex’ ‘M/F’) either into ‘factors’ or a categorical set of numbers (in this case it could be binary-- ‘0, 1’).

Anthony,
Thank you so much! Good idea!
Any other features in that data set that you think I should massage?
Thanks again!
Sincerely,
Steve s.

Tom,
I have not yet. Is there a course on the deep learning site you would recommend?
Thanks for your advice again!
Sincerely,
Steve s.

@Steve_Senteio I will let you think about that as it part of the learning process.

But… after any neccessary conversion, I might be curious about checking the correlation between variables in the columns across the data set. – Though, important ! Remember ! Correlation != causation.

Yet it might provide you with a good jumping off point to think about these things.

@Steve_Senteio, which of Andrew’s courses have you attended?

If you have used scikit-learn or TensorFlow, you probably can just find a Pandas tutorial online (YouTube, etc).

After a glance at your dataset, the only thing I see that’s noteworthy is that all of the Categorical features need to be converted to one-hot logical values.

It’s far easier to use Pandas or scikit-learn for that rather than hand-write code for each of the categorical types. scikit-learn calls these converters “Encoders”.

Pandas is super important skill to have in data science.

Here is a link to one of the best Pandas course out there.

Cheers

1 Like

Thank you Dan, Tom, and Anthony!
Super helpful info.
I love this Deep learning community!

Steve s.

2 Likes

Anaconda offer an assistant fine tuned for data analysis, which I tried and works well enough.
It is integrated into their cloud notebooks, which are a jupyterLab flavour, and it can run locally too.
I tried it and it is well made. It will do good refactoring and linting, which saves time. And it will competently guide you though learning specific aspects of DA and ML.
For free, you get a max of 30 calls per day.
https://docs.anaconda.com/free/anaconda-notebooks/anaconda-assistant/

Personally, I use local models running via ollama or llm studio, both of which can spin a server, which can be accessed by visual studio’s plugins.
The server is also useful to have local models do some basic tasks on data, like entity recognition, with some prompt engineering
Starcoder2 and deepseek-coder are both decent models for coding (refactoring and guidance), if you have reasonably fast hardware to run them on.

Large models online are useful at times to define the steps to take to get somewhere with the data you have, and sometimes suggest interesting options. For instance, you can ask Gemini or Claude 2 to suggest what steps you could take to analyze a specific dataset, or project, if you do a good job with prompt engineering you can make it into an interactive process, with the model adjusting suggestions based on additional info a requirements you provide.

But no model will output complex code for you, going from requirements to deployment, they are useful as assistants, if used cleverly.

With my experience, I noticed, Microsoft Copilot is better in providing code, second is Gemini and then ChatGPT

Thank you!