Today, I’m filled with excitement and a bit of nervousness as I share a project that’s very exciting for me. It’s a reflection of not just what I’ve learned but also my aspiration to make a meaningful impact through data science. I’ve developed a classification algorithm to predict the probability of breast cancer, a project that merges my passion for data science with the hope of contributing to healthcare advancements or any positive area in the globe.
About the Project
This project was born from my desire to apply my skills in a way that could potentially become something interesting. Using Python, pandas, matplotlib, and sickit-learn, I ventured into the realm of medical predictions, focusing on breast cancer - a cause that’s deeply personal to many.
Initial Steps: I started with the essentials - importing libraries and understanding the dataset. My goal was to ensure that every line of code contributes towards a clearer, more accurate prediction.
Modeling Breast Cancer Prediction: I chose Logistic Regression for its efficacy in binary classification. The model uses ‘radius_mean’ and ‘texture_mean’ to predict the malignancy of breast cancer. This choice was driven by the model’s interpretability and reliability in medical datasets.
Evaluation and Insights: Through metrics like the accuracy score and ROC AUC score, I gauged the model’s performance. More importantly, I developed predict_tumor_probability to directly apply this model in predicting breast cancer’s nature based on specific features.
Sharing with Purpose
My primary motivation in sharing this project is twofold. Firstly, I seek your invaluable feedback to refine and enhance the model further. Your insights and suggestions will be instrumental in pushing this project beyond its current boundaries. Secondly, I envision this as a stepping stone for collaborative efforts. Whether it’s through improving this model or embarking on new projects.
Looking Forward
As I share this project with you, I’m reminded of the immense potential we hold as a community. This project is but a single step in a journey I’m eager to continue, filled with learning, challenges, and hopefully, significant contributions to technology through AI.
I invite you all to dive into the project, share your thoughts, and perhaps, consider how we might join forces to explore new horizons in AI.
Hi 3than. Thanks for sharing your work. You mention the metrics that you used but the results are not shown. It would be nice to see your validation results and your confusion matrix. I suggest that you publish a Kaggle notebook, if possible.
@vbookshelf I had a quick look at your GitHub. Maybe my question that I am going to pose is written somewhere in your GitHub. I would like to ask what is the computational resources you used and the training approach. Eg. Transfer learning etc etc. Thanks and cheers
Your project is interesting. I see that the data that you were working with had 10 features to choose from. You used two of them, as you mentioned, radius and texture. I am wondering about the process you went through when choosing these 2, out of the 10, in order to get the results.
Certainly! All of the 10 variables are very valuable in this type of detection’s. The reason I pick those two was to get a model that can be as accurate as possible to leverage a decent project not only to the community but also to myself as a reflection of my aspirations and wheeling to continuous learning.
I believe it is a pretty decent work as I keep growing in this field!
Those who review your project are not likely to be impressed by performance that looks impressive only due to your decision to make the problem simpler (by ignoring 80% of the features).
I recommend you expand your work to include more of the dataset’s features.
Perhaps this would give you a path to showing your skill at comparing the performance of simple vs. complex models.
I didn’t measure the loss of value that the project may had due to not considering all features, this because even with two features the model had to analyze over 500 registers. You’re feedback it’s a very clever point of view, I will work on it!
Approach: Two Yolov5 models, trained separately and ensembled during inference. In essence Yolov5 was fine tuned used the training data.
Hardware: 2 x RTX A5000 GPUs, hired on vast.ai
This is the model card. It’s a summary of training, validation, hardware used and other app details. Model Card
All the training notebooks, including the cell outputs, are available. This is one example. It shows the step by step process that I followed. Jupyter Notebook