Spaghetti code
Spaghetti
Nobody ever say so, but for so many years now I have been confronted with DWH spaghetti projects. No matter the tools used, somehow it seems an utopia to create a clear architecture / ETL / Datamodel in time.
The problems are huge. People come and go. Without a clear view of what new people have to do they start their own way with their own vision and methods. The beginning of the end. And where does it end? High and rising maintenance cost, disappointed users etc etc.
the ETL ran in error, can we restart it?
The information on this report is not correct, what is the source of it? etc.
At the end there is no budget left to develop required business changes. Nobody has indebt knowledge of what the DWH is about in detail. The usage of information decreases and business people find other ways to get proper information, creating enormous cost on the other side to fulfill the information need within the company.
Then what causes this? This has been asked me by organisations so many times. Below the main topics/solutions.
Documentation
Often the first topic skipped when there is no time or budget left. Too many times this is the main cause heard when there is no (proper) documentation at all. An open door, off course, but when there is no documentation at all there is no overview of what has been delivered initially. Nobody documented the initial vision/needs for the DWH.
So when new people join the program and other leave the first problems start. The new people have other insight then the initial purpose was. They argue what was left and start building their own way on what still is etc. The consequences are huge.
On the other hand I’ve seen situations that there was an overkill of documentation. ETL mappings being described at almost code line level. Either the costs to maintain these documentation are too high or people just don’t know the correct place to maintain changes at all. Even soon after initial delivery documentation does not match the code anymore and everyone involved is doubts the truth and start ignoring the documentation at all or start documenting from scratch.
Overall it has to be said it is hard to get the proper level of documentation. Even I can’t give you a exhaustively list. It depends completely on the situation in for example skill level of the organization/people involved. Purpose of information etc. However some examples can be given:
- Architect DWH/Information (company) Vision
- High level of purpose of usage
- High level of functional areas already in place
- Principals, Standards and Guidelines (on ALL levels!)
- Overall Architectural overview
- Dataflow
- Data models
- Maintenance manual (restart ability, parameter descriptions etc)
- User manuals! (yes very important to have, but they should not only tell somebody how to open a report. It’s more important to tell users what they see on a report (definitions and so on))
The way to document these topics, is also point of discussion. Ever seen those huge word documents with over 50 to 100 pages? Yes, ever read it completely? No? ………….. Me neither! Yet again an open door, but still…. these ways of documenting are so worthless. I would suggest to use tools that cost a bit more (or can perfectly be replaced with open source…)
Again this has logic, but bear in mind that budget and/or time are always little at the end of initial increments (or project if a water fall method has been used!) or is out of concern from the current project leader….
Yet again from your own experiences; will it be normal in practice that a manager will let you implement or spend time/money to arrange proper documentation of the deliverables? No, mainly because there are always other priorities. I would suggest start discussing it when project plans are coming up initially.
Give it priority and at the end it can save loads of time/money. Invest in tools like wiki’s or SharePoint and project management /incident management tools. If done proper it will quickly build up an organization widely usable knowledge base. Bear in mind the savings on maintenance side when done. Incidents can be solved quicker, code can be changed within the existing standards (avoiding spaghetti code) etc. At the end if a DWH project exists for say 10 years, only 20% of the cost was spend on building it. 80% will be spend on maintenance. Do focus on the cost within the lifecycle period of the project and priorities will change on initial setup of the project!
Maintenance
Another main course of spaghetti code can be found in the maintenance once a DWH is in production. Incidents occur and we all accept it as to be a given true. At the end 100% testing is an utopia. If incidents are not properly solved, code can change in the wrong way. Within maintenance there are a few topics to tell about:
Team
Most of the time the maintenance team are not the same as the (initial) developers. Hand over of knowledge most of the time is poor. Sometimes maintenance manuals are developed, but then are missing the right content. Integration of teams and/or knowledge is necessary! Knowledge sharing often starts with available documentation. Secondly the projects knowledge base should be integrated with maintenance tools to create a “one stop shop” knowledge base.
Incidents
Firstly always be aware that a user of a DWH can name something an incident, it is most of the time not! The primarily cause of that is miss understanding the purpose of usage and/or missing Meta-Data or (proper) user-manuals. The last is a piece of documentation I’ve barely seen in any project over the past years. When available they only contained information about how to open a report and where to be. Still users should be told what they are looking at.
Secondly when an incidents appears to be a real technical issue, it is to the maintenance team to solve the problem in the code quickly. The meeting is in half an hour, we need the report The problem is fixed with a quick and dirty solution. Who guards the code? is somebody pointed responsible? Is there a proper ITIL organisation? Does it work? Yes I’ve seen situations that it worked, but I have seen more situations this was not in place properly.
Changes
Business /Technical change requested should be developed accordingly the original purpose either be in place with the architecture made up. Also the architecture itself must be audited in time. It is occurring too many times that the initial DWH has been built by team A while maintenance team B have to develop the changes and they do it their own way. See also the topic Standards/Guidelines and Team.
Maintenance management tools
As mentioned in the Documentation paragraph it is a very wise thing to implement a incident and project management tool (both in one preferable). The tool should be web base. The tool must have proper workflows. The calls/projects in it need to be maintained (managed) in a good way. In that way in time a huge knowledge base for both Business as well as IT can arise.
Be aware to not only have such tools that fill easily. It’s even more important to search it easily!!!!
Also integration possibilities with documentation tools (like wiki or SharePoint) are preferable.
Standards/Guidelines
Last but not least it can be remarked that lot of DWH programs miss standards and guidelines. with standards & guidelines I mean rules to follow (both functional as well as technical) throughout the program when changing/increasing functionality (technically or business wise). It should be common to have standards and guidelines in place before the initial increment starts. It’s a common part of architecture. Too often I have been surprised that they were missing. You need them to take care of common development in both functional (e.g. model, business rule definition description) as technical (e.g. table names, ETL code etc etc) areas.
“No, the IT Company did not mentioned it as a deliverable”
Strange because it should be part of the complete project. Once again, have this in place before starting to write the first line of your initial increment. If in place it can help avoid spaghetti code and more over safe time and money.
Conclusion
Spaghetti code/projects arise mainly due to lack of knowledge. Solutions to cover this are mentioned. Documentation is as part of a project/program as standards & guidelines are part of architecture. They all must be in place before even start thinking about writing the first functional and technical deliverables. Try to set it up dynamically, widely accessible and integrate it with the companies project and incident management tools to create a one stop shop for knowledge about the BI solution within the company.
So far my 2 cents on my experiences with the world of spaghetti code projects. What are yours and do you have extra tips?