Long Time No See
What started as a simple year end prep project at work, took on a life of as its own (as these things will). Don’t want to bore with the details, so I will tell you the moral of the story ahead of time if you do not feel like reading on. It is this: moving some code from a development environment to your production client without doing full spectrum testing is never a good decision.
I am not a software engineer (though I play one on TV apparently) so please excuse my botched use of terminology and places where I make up words for certain concepts that actually have accepted names by software community as a whole.
Whilst doing the prep, it was realized that our SAP production client was not configured past the end of 2013. In the first week of December, we were all alarmed to discover that the code we needed to continue running payroll and a few other modules started referencing the first month of 2014 (due to design flaw). As the system didn’t contain the tables necessary, we rushed to move the tables necessary out of development. While this ‘fix’ enabled us to start performing day to day activities again, the new tables didn’t quite sync up with existing the tables we had been utilizing in 2012 & 2013. As a result, the production client then kept attempting to sync past periods with its new disconnected reality leading to all sorts of unintended consequences.
There were work arounds to keep the data in sync, as we attempted to create permanent fixes in development environments, but they were very manual, time consuming and at the end of the day really only made the issue much worse in the production environment (as they A- didn’t really fix the problem in any sort of long lasting way for any of the data- there was good chance that data that was corrected would have to be corrected again in subsequent week B-if they did correct the issue in the short term, data could not be fully synced with these new tables as they referenced periods in time prior to implementation in 2012 that just didn’t exist and couldn’t be created out of thin air because of SAP’s need to have constants. In short, you could retroactive correct the disconnected reality in the short term.. but if you went too far back prior to the start of system clock, terminal issues stared arising C- the parts of the system the work arounds were applied to put that data into a state that would have to be “undone” when and if a fix ever moved out of development.
Which then brings us to the topic of development. Let me give you a little bit on the lay of the land. We implemented SAP in 2011 with one consulting firm. To say this was a failed implementation is to put it kindly, as not only did it failed, but we were utilizing this highly flawed product for over a year for payroll where as other modules of our business remained in our legacy (old for you laymen out there) system. In order to do year end activities for 2011, we had to move all of the payroll and subsequent financial data from our newly implemented system back to legacy system when the books closed on 12/31/11. This “reverse implementation” was a lot of fun and I highly recommend it to any one out there with sadomasochistic tendencies.
We did another implementation in 2012 with another consulting firm that was slightly less flawed in that it brought all other components of our business out of the legacy system. At best, I can say that this implementation was flawed. At that point, and even now, my opinion on the implementation of 2011 clouds my ability to judge the 2012 implementation as harshly as I probably should. But it really was a rush job- without the full scale testing that probably should have occurred.
At the beginning of 2013, after numerous issue with the 2nd implementation, we went right to the source and brought SAP themselves on board to do a full scale analysis. What they found was alarming, but not exactly shocking as I knew where we were failing from a methodology and design stand point. The majority of the code in the 2nd implementation was just reworked code from the 1st and not really a full scale implementation in and of itself. Fine. We gave the 2nd consulting firm a limited window and budget to get the job done. But the employees of SAP who took a look at the system and at us were all rather alarmed at what they found. More alarmed than we had ever been simply because the 1st implementation clouded our judgement of where things currently stood. Sort of like a heroine addict who quits the smack by becoming an alcoholic, You have replaced one problem with another– albeit maybe on the surface less intense– but still rater problematic.
2013 was a year of lawsuits between the various consulting firms involved in the implementations and ourselves. Our organization began flirting with SAP to do a real fix of the issues. The lawsuits left us in a state where we couldn’t really commit to future plans with SAP or anyone until they were resolved… so we just sucked it up and went on with our day to day lives throughout the year. By October, though I am not privy to the resolution of said suits, or even their current status, we contracted with SAP to get us through the 2013 year end.
SAP approached the situation like all good software implementation design teams would, and pushed us to do full scale testing at all phases. This was an alien concept for us as the 1st consulting firm never utilized this and though the 2nd consulting firm encouraged it, they didn’t really adhere to this development model. So we have not exactly embraced this philosophy, as we have had limited exposure to it in the past. To prepare for the 2013 year end development. we did client copies of our production system to our development and quality environments for the first time in 2013. The middle development environment, quality, exists to do testing with better data than the development environment, as you should be copy the master data in your actual production environment into it at regular intervals, so that once you move the code out of development and into quality, you get a sense of how it will function with the live ever changing data as it exists in your production client.
That being said, we never have- nor are we currently doing copies of our production environment to our quality environment other then at the beginning. The one thing that was discovered at that point, and which has even further clouded our adaptation of this development model and has even eroded the SAP developers to continue to enforce this model- is our inability to get these environments to behave identically even after a full refresh. It makes testing a best guess proposition at best; a “we believe that once we move this code into production it will behave in this manner.”
Now that you have a little more on the landscapes… back to the story. We have these work arounds in production as a quick fix so that the system doesn’t totally implode while we get the table fixes out of development. Due to a erosion of development protocols after the initial client refreshes that occurred in November, I spent a week breaking out quality environment to get it into a similar state as our production environment. But I am human- and with all ware, there are too many intertwined moving parts for anyone flesh and blood to be able to be able to track in ones head, let alone be able to actually impose upon anything code driven. The best I could do is get our quality environment into a somewhat similar state as our production environment in terms of data… the impact of various work around in production, their impacts… their secondary impacts as time moved on after the first event… and the terminal impacts once the system began looking toward table information that didn’t exist as it was prior to 2012 implementation… yada yada yada. I halfheartedly suggested a production to quality system copy prior to this… hoping that SAP would echo this idea and run with it. But I was ignored by people inside my organization and I guess we had beaten the SAP guys down to our own flawed ideas of software implementation at this point.
So I created a best approximation in quality as was occurring in production and ran with it. By late last week, there were signs from our production environment, that about 5% of our employee population would reach the bizarre (for me the layman) terminal state where the system sought out table data from prior to implementation in 2012, thus the work around would no longer function and the issue of the system needing 2014 tables would present in reverse where it would get out of sync for 2012. I did my best and tested the fixes in the quality system which bore no resemblance to reality, early this week. I was still getting some puzzling results, but nothing outside of my already lowered threshold for what could be considered a successful fix.
Two things occurred on Wednesday that have changed my view of how I fit into all of this madness Come Wednesday, I was a little over 75% confident that the results I was seeing within the quality system would fix our issue in production and would probably not result in any additional unforeseen issues. After all, in our production environment, the system was dealing with two disconnected models– one for the future and one that we had always utilized in the past. The fact that any fix would eliminate this gap between the models was necessary even if it caused other issues down the road. This gap was causing too many base issues with how the system dealt with time– and if you know anything about SAP– you know the fucker is obsessed with time being linear. We had caused a huge disconnect there… so any fix that closed that gap was worth the consequences it caused.
Okay this is the point where I need to stop speaking in quasi-tech speak and just get to the point. The first thing was that though it was known that the issue was caused by an initial fix the SAP guys had put in place, there was still errors, though benign, that existed in the data cluster. I had to point out what they were and how they were caused. The big issue I have with this was that understanding where these issues came from was important in identifying what caused the issue in the first place. To not understand the results that were sitting there, in light of having have identifying the cause of the initial issue, fixing that, having a second issue arise due to a disconnect between the fix and the starting point and then fixing that, seemed to indicate a lack of true understanding the software experts seemed to have with the system they were attempting to fix. I had a, “hey this is way outside my pay grade moment,” where I realized I was functioning in the role of a software consultant to MY firm and not the Payroll Manager I had signed on for.
The second moment of clarity I had, early on Wednesday, was during a status meeting I had shortly after I suspected that our developers didn’t appear to understand the landscape they were encountering. In attendance were the CFO, my bosses boss, the CIO, who supposedly is the project manager in this madness (“though he had asked me somewhat flippantly in the past if I wanted that role when he had experienced conflict from my boss and the CFO) and the developers. We started off from approach that we could conclude our testing… and push these changes into our production environment prior to running the currents week payroll a few hours from then. Seemed like a rational approach, as I was fairly confident it wouldn’t make things worse. Even if it did, we could deal with the consequences as they arose. The developers echoed these conclusions and everyone else present had no experience with the matters as they stood from the troops on the ground. Though they were innately cautious due to all of the failures in the past, they took the opinions of those of us who had some idea what was actually occurring into account. They moved on to the stalled year end efforts we should have been completing the previous three weeks.
My focus totally drifted away from the conversation around me. I wasn’t exactly reflecting back upon the entire two and a half year implementation experience but more focusing in on how it was making me feel. I looked around the room and realized that the quick fix methodology we had been imploring this entire time was standing in the way to any complete solution we hoped to achieve. A feeling of dread crept through my body as I realized that we had no idea what the fix we had sitting in our quality client would truly do once it interfaced with reality. Saying I have 75% confidence it will be fine amounted to rat shit, as this was not primarily my job, let alone my area of expertise.
My job lay all over my desk in piles of neglected paperwork. It also was in my inbox in form of emails I should have responded to weeks ago. My voice mail box was a smoking crater with calls I had been deleting after a week of neglect, in the wholly flawed logic that if it were truly important, they would have called back again or they had finally gotten of hold of someone who could actually help them at that point.
As I really only had a tenth of an understanding of all aspects of the system I was trying to fix, and a 75% confidence level that the fix wouldn’t fuck things up more– the actual reality of the situation mathematically boiled down probably amounted to the following; I had only a 7.5% understanding of how this change would impact our day to day operations. I looked at the time on my cell phone and realized we were typically starting our primary payroll by this point in the day. We typically were complete with this process approximately an hour before the drop dead time we had to send information to our bank. I had an hour of analysis left before I was going to feel fully calm with moving the changes into our production system.
Conversation continued to occur around me as I felt my blood pressure begin to rise. It dawned on me again and completely that I was functioning as a software consultant and my knowledge of how the fucking system actually functioned was not going to improve significantly in the course of the next hour.
It is at this point that I interrupted whatever the room was discussing, totally contradicting the decision we had reached prior about the fix saying we should not move these changes over to our production system. That we should just run payroll in the current broken state for one more period and move the changes over later. Everyone present looked at me like I had lost my mind. The wheels had come off. I took a deep breath whilst I composed my thoughts and myself a little bit. I looked back up at everyone and explained calmly (thought in retrospect I might have totally sounded fucking insane) that I no longer had confidence in a lot of our base assumptions and therefore had no confidence in the conclusions were arriving at. That I had no idea what would actually occur when these fixes made contact with reality. That if anything unexpected occurred there would be no time to be able to correct matters prior to our real life deadlines. Though I didn’t expect anything such occurring, there was no reason to believe that it wouldn’t occur… and based on past experience there was a high probability it would occur.
Everyone just looked at me in silence. I just looked at the CIO whose final call it was in such matters due to his capacity as project manager. I didn’t even bother to look to my boss or her boss. I wanted him to call it. By focusing on him, I made him call it. At which point everyone objected at the same time- confused by my apparent 180 on this matter. I stood firm that this was my best analysis of how this all stood. Eventually they all just talked themselves out of steam and I began to realize to my horror just how much weight my opinion actually carried. Though I was the lowest pay grade in the room, my opinion carried the day and upon reflection, it has for at least the proceeding year.
Thursday, once our primary payroll was complete, we rolled the changes into production. For the last two days of the week, I have been forcing these changes upon the data in the system. The entire time, I looked at my job lying all around me from a totally disconnected stand point. At some point, it is going to have to be revisited. But now I know I am going to have to take a firm stand with how these matters lay with my management.
Needless to say, from a writing standpoint, I have been stuck in a black hole for the last six weeks. I am now left to figure out my relationship with my job… with the company that pays my bills…. And my art…. I now know that these are three wholly disconnected worlds I somehow have to integrate. At least one has to go immediately… stay tuned.