Want Some Hints on C3M2 Assignment: Social Network Database Fix

I only can spot there is an issue related to duplication records … but I still cannot pass the 1st unittest, can someone provide some hints on what aspect I need to pay more attention?


1 Like

found the hints in the bottom of the ipynb file, but I don’t understand the question “Does the random is necessary?” if the np.random.choice is not necessary, then does it mean I need to deduce the data insertion logic by examining the unittest cases?

1 Like

I came here for this same problem and also to report the grammatically incorrect and incoherent hint.

Really struggling with this one. This code seems broken:

    # Populate Club Members table
    for club in clubs:
        num_members = np.random.randint(5, 10)
        members = np.random.choice(people, num_members)
        club.members.extend(members)

The problem is it can generate duplicate people so you add a person to a club more than once. This will generate an integrity exception on the DB. On the other hand if you try to remove the duplicates (e.g. by wrapping in set() or adding a replace=False argument), then you break the pseudo-random number sequence for the test data generation and all the tests will fail because the data doesn’t match what is expected. ChatGPT has basically sent me down the paths of “don’t generate duplicates”, “don’t add duplicates”, or “suppress exceptions in the case of adding a duplicate”, none of which seem to be the solution here.

The other weird thing is that the 3 functions we are supposed to add are already written. Are we supposed to correct them or something? Looks like they already work though (once the DB works).

3 Likes

I spent the majority of my time on this exercise. Comparing with the other graded programming ones, it seems poorly written, as it devote most of the things at the big fat loader function.

Geeraver already gave you a hint, make sure you’re actually inserting names instead of random selections. Also, executing the block will give you warnings, fixing them improves the overall solution.

2 Likes

Yeah, not sure what’s expected of this assignment.
While populating the club members, it is using num members from 5-10, but in the solution, there are only 4 members in some of the clubs.

not sure if there is any criteria specified on how to select the people for the club. So not sure on what basis do we need to cross the random selection of the num of people.

1 Like

I was stuck on this problem for a long time. Eventually, I solved it using Hint 1 from the puzzle and the result of running unittests.test_load_dataset(load_dataset) (Failed test case: Incorrect number of persons in the database. Expected: 10 Got: 100).

I just changed one code block and then all tests passed. That worries me because I think one change should fix one test so I may missed something.

Now, I think this problem has significant issues. To pass the test, you need to modify the logic for constructing test data, but there’s no guidance on how to construct the test data correctly, and the problem’s own logic for constructing test data seems reasonable. I’m afraid that there is nobody can resolve the problem without these hints.

I don’t know what skills this problem is supposed to practice. Anyway, I didn’t use LLM because it couldn’t identify the issue.

The warning like relationship 'Club.members' will copy column clubs.id to column club_members.club_id, which conflicts with relationship(s): 'Person.clubs' (copies clubs.id to club_members.club_id) maybe better issue to practice with LLM.

1 Like

Using multi-shot and looking for the exercises and reiterating that the load function might not be right will steer you on the right direction. You can also include things like:

  1. Help me understand the steps executed in the method xxxx.
  2. After that, using mermaid notation, create a flow of these steps.
  3. Verify the steps against the questions below, etc…

For the mermaid tool, you can use live editor for free in mermaid.js.org to visualize the code flow.

Thank you. I got the first load_dataset passed. The rest 3 functions still failed…

Mr LLM hinted these:

Explanation of Changes:

  1. Directly used the provided names and other attributes without random sampling: This ensures we have exactly 10 unique persons in the database.
  2. Reduced the number of random friendships: Since there are fewer people, fewer friendships will be created (from 200 to 20).
  3. Kept the Club members population random but within a realistic range for the 10 persons.

Thanks a lot for all the comments. There seem to be many confusions regarding this assignment.

Simply put, RESET EVERYTHING.

And Just solve for the cryptic message “Does the random is necessary?”
Basically, remove the random sampling and add all the People. And fix the indentation issue :slight_smile:

THATS IT. No other changes needed for the entire Notebook. All tests would pass

3 Likes

Just have single entries without the randomness for Person; it should all work.

Yes, it would pass. But the change/hint is based on what ?

Using 200 people would be an absolute valid criteria for generating the dataset.
Why do we need to use 10 people and not 200 ?
We are deciding the functionality based on test failure ?
But the tests themselves are supposed to be based on actual requirements.

So, not sure what is the objective of this exercise ?

Maybe this whole assignment is about reverse engineering. To use the results of the unit tests to deduce the content of the Person table:

Chain of reasoning:

  1. The result from “unittests.test_load_dataset(load_dataset)” indicate that there should only be 10 persons in the Person table.

  2. The results from “unittests.test_get_club_members(load_dataset, get_club_members)” showed in all the clubs, there are no members with the same names.

  3. The results from “unittests.test_get_friends_of_person(load_dataset, get_friends_of_person)” showed that no person is a friend of another person of the same name.

  4. So this imply that there is some kind of constraint to be applied on Person table. However, implementing a constraint via the database schema will result in errors when calling “load_dataset()”.

  5. So it seems the constraint has to be “built into” the data loading process.

  6. Looking at the “sample data” of names provided, this array of names seems appropriate to be used to populate the two database tables Person and Club that would satisfy (1) - (3).

  7. So now it’s about the ordering of the names such that it would produce the same random choices generated by “np.random.seed(42)” used to populate the “Friendships” and “Club Members” tables.

Fortunately the initial order of names worked, otherwise we would need to search through a permutation of the order of the names. Of course we could use the LLM to produce a search program but then this becomes a totally different exercise from the intent of this assignment.

Now I wonder how it could be possible for the LLM to produce the chain of reasoning as shown above :grin:

Objective is to learn Reverse Engineering :scream:

Instruct the LLM to somehow hard code in a database format all possible combinations based on the result of the Failed Cases you receive as a response. Just work out on the def load_dataset() arranging everything and then the rest of the functions don’t need any modifications at all. Just run them as it is and hopefully you’d get All Tests Passed. Good luck

1 Like

This assignment has a tremendously easy solution but it takes you for a really bumpy journey. From the start it is not immediatly clear what you have to do, mod the main function or the 3 functions to pass the test, maybe it went through several iterations and the final assignment comes out as messy. In the end the only hint you have to follow is the first one, it clearly tells you to strip out the randomness from the way the people list is contructed, and that alone make you pass the assignment since the other issues are already taken care of in the code. You can ignore the warnings, leave the relationships to 200 it doesnt make a difference, the only important step is having the people table in the right order.

1 Like

This assignment has several misleading sentences (I don’t know if put on purpose or not). They ask you to implement three functions… that are already implemented. And one of them have a different name in the text (exercise 1). Both “mistakes?” made me think if this whole assignement was to trick us, to trick the LLM, or they just forgot to remove the functions’ implementations before uploading the ipynb. Who knows!

Hello everyone,

We apologize for any confusion caused by the current assignment. We are in the process of revising it to ensure the instructions are clearer and the solutions can be easily replicated.

Additionally, we acknowledge that the solution was inadvertently included in the assignment. This has already been fixed. Please expect the updated version of the assignment to be available by Monday.

Regarding the 200 insertions in the tables, the dataset is actually given in the code, and they are unique persons, so adding 200 entries is not correct.

Best,
Lucas

2 Likes

This has been such waste of time. On the one hand yes, it seems like the Person test data might be the actual people, so so shouldn’t be randomly sampled. But then the code right after makes 200 friendships, which seems nuts if there is only 10 people. That’s what threw me and stopped me from not randomly sampling people data and chasing down rabbitholes leading nowhere instead. Plus the due date (for me) is 3 days past already.

Is this now fixed, I am fighting with first test. And i do not understand badly written hint #1. What does that hint even mean! How I can know how many persons should be created, there is 200 friendships. Everyone cannot be everyone’s friend. etc. THIS IS BAD!

Now there is updated notebook, and this weird test data generation phase is totally removed. So there is still 3 functions to fix and maybe some datamodel also. But that confusing part is removed. I did move all my lab files to self-made folder “old” and then from menu Help/Get latest version got new fresh files to my work folder. Part 1/3, I passed it in 2 minutes ;). Thanks for the updater! Now it seems to work.
EDITED: Ok, I passed first 2, the last one seems to be tricky. Has someone passed it in this new fixed notebook? Do not give code, just info that this is possible to pass. Unittest gives to the last function PASSED, but grader does not accept it.