Just need a clarification whether it is possible.
In the notebook, Explore Phase - Exploring Air Quality Data, the Bogotá air quality data come from the Bogotá Air Quality Monitoring Network. Neverless, the sulfur oxide has been mentioned as pollutant, and it is also enclosed in the original dataset, it has been excluded from this analysis. Why? Perphaps because there are too missing data for this pollutant since the sensors have been under maintenance ? Thanks in advance for your input.
2 Likes
Hey, @Michela_Agostini
It’s great to see your thorough exploration of the dataset. It is indeed sensible to exclude features with a substantial amount of missing data from our dataset. This practice is often implemented to ensure the generalization of our models and prevent the inclusion of noisy or redundant features. Notably, data originating from sensors undergoing maintenance may not be viable for inclusion due to its unreliability.
Regarding the assessment of sulfur dioxide as a pollutant, it is typically assigned a severity rating ranging between 5 and 7 on a scale of 10 when compared to other pollutants. This rating signifies that sulfur dioxide holds a notable position in terms of its environmental and health impacts, but it might not be the most severe pollutant.
Best,
Abhinav
2 Likes
Thank you for the further explanation about the generalisation of your model due to the impact of missing data in the database.
Interesting is the “low” value of the severity rating of the sulfur dioxide compared to the other pollutants. I agree with you that this parameter could represent a strong assumption to exclude the assessment of the sulfur dioxide in the explore phase of the air quality data.
Regards, Michela
1 Like
Hello Abhinav,
Can I know when you mention this scalability of any pollutant in such assessment, on what parameters it is done. Is there some standardisation parameters or dataset based.
From medical perspective what I know there are lethal dose parameters for such agent. Is it based on guidelines from National Institute for Occupational Safety and Health (NIOSH).
Although I agree with the exclusion based on missing data due to reason behind the analysis was more about what amount of air quality has an effect on environment of Bogota.
This exploratory analysis is trying to correlate how or what content of air quality has an affect on the environment, even though I know it is about quality but again measured based on the amount/quantity of different pollutants affecting the air quality. So any occupational hazardous pollutant need to be added as an outlier and/or as variable even if it is not corroborative with the analysis.
What do you have you to say about this??
Regards
DP
2 Likes
Hey, @Deepti_Prasad
Sulfur dioxide is, indeed, a highly corrosive and toxic gas. Exposure to elevated levels of sulfur dioxide can pose an immediate threat to both health and life. I based my earlier statements on information sourced from various articles. And made conclusion around general information and it may not fully cover or encompass all aspects from medical perspectives.
Sulfur Dioxide | ToxFAQs™ | ATSDR (cdc.gov)
10 Most Harmful Airborne Pollutants you’re breathing everyday | IQAir
Sulfur Dioxide Effects on Health - Air (U.S. National Park Service) (nps.gov)
Regarding data exploration, it’s essential to consider all important features in a dataset. However, it’s worth noting that noisy data, especially when obtained from sensors undergoing maintenance, can sometimes be disregarded or treated with caution when drawing conclusions.
Hope this helps.
Best,
Abhinav
1 Like
So basically you are telling because of missing values related sulfur oxide data, it was excluded which I am 100% agreeing with you statistically.
But then the overall aim of analysing air quality of Bogota by model algorithm would not be as accurate as it should include all the hazardous pollutant, am I taking this part right?
I only wanted to know how this could be addressed!! By collecting more data!!!??
Regards
DP
1 Like
Quoting “…can sometimes be disregarded or treated with caution when drawing conclusions.”
When dealing with a dataset that contains missing values, there are various strategies to address this issue, ranging from simply disregarding the incomplete data to employing imputation methods with advanced techniques.
Additionally, it’s valuable to assess the sensitivity of the imputation process. As mentioned by you, another option is to augment the dataset by gathering more data to minimize the impact of missing values. Assessing the uncertainty introduced by imputation is crucial as it offers insights into your conclusions. Also, it’s important to keep in mind that the process of imputing data can impact your model, thus require a thorough evaluation of its performance.
Certainly, similar to the various tradeoffs in ML models, you often need to strike a balance between multiple important factors. And I believe analysis of air quality of Bogota doesn’t only depend on Sulpher Dioxide. Is it?
Best,
Abhinav
1 Like
Yes surely doesn’t depend only on Sulfur oxide, but it also can/may be have an effect, so excluding that again can question the accuracy of the analysis, that is what I am trying to put forth. Usually in healthcare analysis when significant factors get excluded due to any of the reason of missing value, or having very less in valuable, it is mentioned in the published paper with the reasoning behind why it was excluded and give honest analysis that the accuracy might have an effect if the sulphur dioxide values were included as it is one of occupational hazard pollutant.
Regards
DP
1 Like