Any advice on a phd proposal regarding machine learning

First and foremost, I wish to extend the heartiest of hellos and the warmest of welcomes to everyone here. I recently was introduced to the world of machine learning through some wondrous circumstance, and have been sitting here for a week straight in complete awe of some applications that it could be used for.

I am currently applying for fellowships for a PhD program and have decided to incorporate a machine learning algorithm at its core. I do not know how well machine learning can be applied to my thesis, as I do not have a coding / math background, but rather a political science/ philosophy background. Nonetheless, my symbolic logic classes have been seeing me through all this code and math.

For my thesis, I wish to redefine [negative] democracy in the philosophical rendition of positive democracy. To sum the definitions of the respective versions of democracy woefully short, negative democracy pertains to the absence of constraints to freedom, whereas positive democracy ensures the ability to access those freedoms (e.g you have the freedom to go to a top university, but because you do not have any resources you do not actually have the freedom).

The part where machine learning comes in is in the creation of various activations layers, which will deduce out based upon my input variables (eg. income inequality, gender inequality, race inequality, etc.) to deduce out a output ranging from 0 to 1 which gives definition as to how ‘democratic’ a country is based upon my definition (input variables).

Mind you, my machine learning competence is basic to give an overestimation; but for-whatever reason, I find this very intuitive and fun (in a philosophical sense). Thereby, I humbly ask for any help in the creation of this thesis, would it be viable? Doable in 3-4 years? How many layers should I be looking at? What would be the recommend number of training examples? ( I was thinking just the OECD countries to start)

Thank you so very much for your advice and kind attention,


Hello @Naamveer_Singh,

Let’s focus on your question “would it be viable?” first. I think we can take the following as some challenges to your thesis idea, or as some open-ended questions for discussion.

  1. A supervised learning algorithm requires labels. Are you the one who will label the degree of “democrazy” for the countries? Or simpler, do you have those labels?

  2. What is the purpose of the algorithm? Is it to predict some countries that don’t have the labels? Or are you going to look for some patterns that is derived by the algorithm (even though it might be challenging to interpret it)? Is the purpose going to be a good one?

  3. Is your data an aggregation of a certain time period (like 5yrs) that is prepared for speaking about the country’s degree of “democrazy” in that period as a whole? OR, is your data a continuous representation of the country’s last (e.g.) 100 years, and it is prepared for speaking about the country’s change of degree of “democrazy” decade-to-decade? The latter is obviously more interesting but also more challenging, because your dataset need to capture more systematics that might seem unimportant today BUT is important a long time ago.

  4. Do you have the dataset? Or at least an idea of how to collect the data? Do you have access to subject matter expert opinion so that you can build a model incorporating their opinions?

  5. If you have the dataset, have you done any preliminary analysis with it that you can share with us?

I think your answers to these questions should give you some ideas of how viable it is. As for the number of layers and number of examples, it depends on the complexity of your problem and number of useful features you can get. Instead, I think it is about how much examples you can get. A good model is always thristy for examples.

If you havn’t considered any of the above questions, please take your time and do some research and perhaps discuss with some researchers. Our discussion won’t go too far if your response isn’t backed up by your research. Afterall, it is about your next 3-4 years so only you is the best person to start building a bigger picture and roadmap that you are willing to take the risk and responsibility. If your conclusion is that it is not doable, then I won’t convince you otherwise.


Mr Raymond Kwok,

Your considerate reply is very deeply appreciated, it was well thought it and presents the problem in a very logistical manner. I thank you profusely for it.

To simplify down the purpose statement, I wish to to give a quantifiable variable to a created definition. I am hoping that the algorithm would be able to utilize given variables such as a gini-coefficient score (which measures a particular countries income inequality) + (other like variables) to be able to then create a ranking (based upon the the output variable).

As this would be my first attempt on a machine learning project on my own, I am trying to keep it as simple as possible and would most likely use a limited time-frame of data, most likely 5 years. The dataset would not be created on my own, in the sense of me creating the labels based on certain variables, but for the ease of calculation, be variables found on datasets such as the United Nations, Federal reports, etc.

In regards to subject matter experts, I am privileged to be able to inqurie to some of my professors or other individuals for their thoughts on the matter, including this forum and such brilliant members such as yourself!

As for any preliminary analysis, I have done a theoretical framework on the qualitative aspects, but am designing the quantitive at this moment. This project would be a personal passion of mine, but if it is not viable, there are a multitude of other examples (a bit more of the concrete variety) which I could try to incorporate. At the end of the of day, I am really excited about the endless possibilities of machine learning and on social science research (e.g. creating an interactive global map of poverty giving live updates) or other various social justice issues. But alas, that is for the future, at the point I am just trying to find a way to incorporate this process into my research, and for you assistance again I thank you Raymond. I wish to be able to be a proficient machine learner and for that, a simple and doable (phd) project on my own would be exceptional start.

Like I said, the phd would be 3-4 years, so I’ve got some time before then to formulate a solid theory (a year at-least). And, I am only about halfway done in the certificate course, so with a bit more proficiency and practice I should know if my project is doable or not.

You have given me plenty and then more to think about,

Thank you very much Mr. Kwok

You are welcome @Naamveer_Singh. It seems to me that you are (for now) taking it as a small project, so in the future, if you have a presentable analysis of your dataset and modeling result , please feel free to share them with us here, and maybe some of us can give you some feedbacks on it. As for modeling results, besides what you would like to show, since I guess you are taking some deep learning courses, you probably will learn what are some important things to inspect for figuring out how good your model is. One of those things is the learning curve. Please share also those things with us, if you want some feedbacks.

The anslysis of your dataset should at least be as important as your modeling work.


1 Like

Mr. Kwok,

I very much intend to do so. Your thoughtful insight on the matter is forever appreciated.

Thank you again,