Week 4 - Conversation with Software Engineer

In the interview the proposal is read replicate database. What should the security and compliance policies be to maintain confidentiality?

Are schema changes in databases usual? Shouldn’t a change control process be defined?

The use of agile methodologies in software development is widely used, can we also apply it to data engineering? How do we do this?

In Unified Process we use UML to model. Is it still possible to use UML to model the architecture? Are there other tools that can be recommended?

When capturing requirements, if we use agile methodologies, do we express them as user stories?

Thank you very much for your help!

Luis.

I will try my best to answer to the best of my ability.

In the interview the proposal is read replicate database. What should the security and compliance policies be to maintain confidentiality?

Assuming your course tag for this post is correct, topics about security and compliance will be briefly touched upon later in the course. But from my experience it is not very comprehensive as this course is more on the functional aspects of data engineering. The idea of security and compliance can be grouped under what we call data governance.

Are schema changes in databases usual? Shouldn’t a change control process be defined?

Changing schema requirement depends on the company and their requirements. Further in the course, a topic that is touched upon are data contracts which are a set of agreements on the schema of the data. Similar to changes in upstream data, usage of data contracts are also dependent on the company. In my point of view, having clear and effective communication with your upstream and downstream stakeholders are key and schema evolution is normal in a business.

The use of agile methodologies in software development is widely used, can we also apply it to data engineering? How do we do this?

Data engineering is actually a subset of software engineering but focused on data. Lots of DE principles are actually taken from SE. We have data lifecycle and SDLC, dataops and devops, versioning, coding, database management, data modelling, networking, security and compliance, APIs, you get the gist.

In Unified Process we use UML to model. Is it still possible to use UML to model the architecture? Are there other tools that can be recommended?

UML is a modelling system tailored for software systems. I don’t see it being too much of a use for DE, but I might be wrong on that. Mainly what DE deals with are databases, so we use mainly use ERD to model schemas. Sure, you can use sequence diagrams to model your data pipelines, but not sure how much use is that rather than having it in code inside an orchestrator. Having knowledge on UML is good though, having to speak the language which your possible stakeholders use.

When capturing requirements, if we use agile methodologies, do we express them as user stories?

This relates to project management and it depends on the company. But from my experience, DE teams do use project management tools such as JIRA to support the agile methodology, breaking user requirements into user stories and plan sprints, similar to SE.

Thank you very much for your help!. But I have many doubts about the processes and the methodologies. I hope to resolve them in the next courses. When you work with real-time and critical systems, you need to use methodologies, security and compliance. Because you need to ensure the operational continuity of your business. Perhaps as a suggestion for those responsible for the course it is good to incorporate best practices and methodologies in data managemenent. These could be readings or case studies.