Reducing Support, maintenance and QA Costs (RTB) with Self-healing software

In the recent past organizations always focused on efficiency and effectiveness. This in turn put a lot of focus on time to market and productivity however the quality and consistency aspects of any software delivery somehow always took a back seat.  A focus on time to market is extremely important to be an early adopter and reap benefits of market leadership and revenue share. 

So many delivery teams are forced to select either one and oftentimes sacrifice on quality and consistency of delivery to customer. Over the years quality was thought to be brought in by encouraging the usage of application development frameworks, library functions, re-usable code snippets, sample design documents, design patterns and so on and so forth. This is the view from a development standpoint.

Maybe the above was not so effective and that is when independent QA/ testing became more popular and many customers adopted the black box functional testing to ensure that the business requirements are properly met as per user expectations.

Despite these approaches code still fail in production, support and maintenance costs are still required and considered as a necessary business operations expense.

The inherent knowledge that is present in a SME is revealed only when he handles specific challenges on production code under SLA/penalties. Given a situation they take certain steps and resolve the issue and sit back and relax. This knowledge somehow is not reusable to others as it could be specific to certain customers, projects or environments and also shrouded in data privacy and legal commitments to the customer. However the question remains how to reduce the RTB costs or cost of software maintenance while leveraging existing knowledge?

One thought to reduce this cost is to design self- healing software?  Is it possible, can software heal itself?

I might have wondered the same if I did not know about self-sealing tyres in the event of a puncture. If a running part like a tyre in a high speed automobile can be designed to heal itself in the event of a puncture, then it should be possible to do the same with software – it should be able to recover itself, diagnose the cause with metadata and execute alternate recovery code to run business as usual parking the exception for a later recovery/resolution.

This metadata is the knowledge that is required to process the event, the physical actions taken by an SME to resolve will be the diagnostic steps. Once the event and steps are captured as actions in a database, theoretically software can be built to heal itself temporarily until a permanent fix is done.

A solid Knowledge Management (KM) program should not try to extract knowledge explicitly, it should capture the actions as meta data then and there as the event or action occurs and is resolved. Neural networks can be constructed that can further learn from the actions taken by the SME.

Suppose we are able to create such a KM diagnostic software on a commercial “ticketing application” as a wrapper, this provides all the metadata required to build the self healing intelligence into the base application that is reported into the ticketing application. The neural network in turn can provide predictive capabilities.

Futuristic applications will soon have self-healing capability. To build self healing software we need meta-data, a neural network and a database that captures SME actions until such time we have enough actions to create diagnostic programs..

This may sound a bit far-fetched but that is what man thought about air travel a hundred years back.

Moments of Truth- QoS

As I mentioned earlier in my previous post – I took a small summer break visiting the Silicon Valley of India- Bangalore. My experience is prompting me to make certain observations as I had several “moments of truth” in the short span of 4 days.

As always we try to look for the best for the family and I did the same given my means (my budget) and the desire to get the best value for money. I had booked in a nice hotel via a travel agent. For the purposes of privacy, I will use fictious names and real place names. My travel agent let me call them “master mind travel agency” and my hotel, let me call them “BMC Holiday Inn”.

The booking of tickets and rooms was fraught with tension as to get the right hotel at the right place at the right timing is a harrowing experience. However with some extended use of internet and a few phone calls, I thought I had all got it done to my satisfaction.

My expectation before my travel was simple –

a)      To have a clean room

b)      Near the commercial centre of the City

c)       With a good travel desk

d)      Decent food in the restaurant

Predominantly I can say that objectives a) and b) were met and I can give a score of 3.5 out of 5. Rooms were clean and were Ok and yes I was bang in Brigade road a very good commercial hub of the city and very close to a lot of sights in the city.

When you are new to a city, it is very hard to navigate and you need a good tour guide to take you around without ripping you off. That was the premise with which I approached the travel desk of BMC Inn. The BMC Inn front desk was helpful. I wanted to take an excursion to Mysore and see Melkote, Srirangapatna and then the Brindavan gardens at Mysore. They said I can take the hotel cab, start at 8:30 AM and then return the same day. The rates were communicated as Rs 2250/- for 250 KM and Rs 7/- per additional KM.

So we did start as per plan and completed 50% of the journey, we finished Melkote and Srirangapatna and on the way to Mysore – we hit a big, long and nasty traffic Jam. The cab driver was called karthick and he was a fairly Sr.person in terms of age – 50+.

So I gently asked him how long he thinks it will take to clear and he said he does not know and intends to wait it out. 10 minutes became 30 minutes. I asked again and the very same response.  It was 3:00 PM and we still did not have lunch, the car engine was turned off to save petrol and hence no AC, it was sweating like hell. He still does not know and does not have a plan.

I got worried and asked him to turn back or find an alternate route. Still the same response- there is only one way and one route, and he is unwilling to turn back and he is asking me what is that he should do next. I suggested that we might consider turning back. But soon that was not to be an option as the traffic piled up behind also and we were completely grid locked.

So it was a sweaty and swanky 60 minutes of wait and a lot of cursing on my side on the traffic and the stupidity of the driver. Finally the traffic moved (only by Gods Grace) and we made it to our next stop. This was the first moment of truth for me.

If you are in a long traffic Jam, the hotel cab driver has no idea what to do.

Despite I being the tourist in an unfamiliar place, I was required to provide solutions to the situation.

Is this familiar? This is how sometimes big IT customers get stuck with offshore vendors. When you are in crucial project and you hit a roadblock – the vendor does not provide solutions but instead asks you in turn what to do next.  Wherein the expectation from the customer is “You tell me you are the expert”

In my tourist situation, I simply could not fathom how a cab driver with so many years of experience and living in the same city and having done similar such trips hundreds of times not have an answer?

After we crossed this huddle, we stopped for lunch and then proceeded towards our next halt.  Our experienced cab driver really does not seem to be so experienced. His driving is jerky and he hits the break really hard such that the passenger’s get thrown. I really did not notice this much earlier but this somehow became more obvious in this leg of the journey.

We were in a signal waiting for the color to change green and there was a tremendous noise in the back, our cab was hit in the back by a Mercedes Benz- thankfully not at full speed. He was trying to stop but somehow could not.

Out we all stepped out to inspect the damage, it was mild but still will cost a lot to fix. Our cab driver gets into a discussion with the Benz guy. We were left standing in the road for more than half an hour and I had to remind him several times that “ I was paying for this trip and meter is running on my money”. Finally he relented and got some details from the Benz guy.

By this time, I was pretty sure that what I wanted to see in the next halt would be closed by the time I reached there. So I told the cab driver to cut my trip and return back to the hotel.

Maybe a lengthy story however the moments of truth that I had in this trip as I reflected over the incident was enlightening.

So despite a high room rent (about 75 USD per night) all I had was a bitter experience with the hotel.

3 major incidents stand out for me

a)      Cab driver not able to provide a solution before the traffic jam became a grid lock

b)      Cab driver does not care about the tourist spots I want to see in the trip

c)       Cab driver  is not conscious that I am spending  x bucks per hour of the cab rental

So most IT vendors are like this hotel. It is big and posh and charges a high rent in the name of brand, it is ultimately the cab driver that mattered for me the most.  The cab driver is more or less the delivery manager that you get and if you are in luck you have a good driver if not a crappy driver and a bitter experience.

Suppose if the Hotel was ISO certified would it have given a better experience. I don’t think so, you could still end up with a crappy cab driver.

So if you are a business and want to outsource– it is essential that you do a proper due-diligence of the services you intend to purchase from the vendor and also ensure that you are comfortable with your cab driver as well before you sign the deal. While changing the cab driver is always an option but it is always a bit late and only after you have incurred some inconvenience or damage.

While I had expectations of a) b) c) & d) as listed above, Item c) was the most important for me a tourist and I failed to do a proper due-diligence which cost me a lot of money and agony as well.

So if I think about expected behavior of the cab driver- I would have wanted him to be proactive in thinking about traffic conditions, if we still got into a situation to use his local skills , language and knowledge and navigate us out of the problem, if not at least provide the right guidance / input to help me take an informed decision.

None of the above was forthcoming nor seemed to be the capability of the cab driver and this is what any customer would mean as “lacking in thought leadership”.

So how does it all fit into the quality bandwagon and my thought is “Measuring the Quality of Service”.

How many of us measure the quality of service? More often we are used to responding to a tailored questionnaire sent by the vendor with some ranking numbers. We answer the number and the vendor goes away not to come back for another 6 months. By the time the entire team on the ground would have changed as well.

See you in next post with some more thoughts around the QoS.

Quality Reminiscences

I was on a short summer break and it gave me some time to introspect on various things in life and how we view quality in other apects of our life, particularly in the hospitality industry.

Is quality restricted to software alone..? How should we treat quality in the services space? and my own realization on how it feels to be the customer who has outsourced services to offshore vendor and how we (the vendor)  generally fall short in expectations.

Will be writing in detail,  once I re-organize my thoughts into something more organized and interesting..

Stay tuned…

Cheers

Bala

How does Google test its software..?

 I never thought about this question until this was popped to me by a customer in a proposal discussion. It was a valid and interesting thought and I did not have an answer, until I started Googling for the answer. Fortunately Google being good Samaritans they are in all initiatives have answered this question in their blogs. You can read about it here: http://googletesting.blogspot.com/2011/01/how-google-tests-software.html.

It is 7 part series written by James Whittaker, Test Director for Google and had joined from Microsoft in 2009. The key tenets of testing inside Google can be summarized into the following points and I have taken the liberty of trying to compare/map to project teams in typical organizations along with my views and observations on the same.

Test Organization

Well according to James, they do not have a testing organization at Google (despite the fact that he is the director of testing!!). Anyway Google has a lot of Focus Areas (FA). Engineering Productivity is a Focus Area and a lot of testing exists inside the Engineering Prod FA. There are several horizontal and vertical disciplines inside the Engineering Prod and Test seems to be the largest team over there. The 3 major teams that constitute Engineering Prod are:

Product Team- This team produces tools and as per James “The idea is to make the tools that make engineers more productive. Tools are a very large part of the strategic goal of prevention over detection.”

These teams build code analyzers, automated testing tools, test case management systems, bug databases etc. These tools are consumed by all walks of engineers across the company.

Services Team – This team provides expertise to the product team in variety of areas like tools, testing, release management and training. They also have experts in the areas of reliability, security and internationalization testing.

Embedded Engineers- These are people loaned out to product teams on a need basis. A Tester identifies himself with a product team and reports to an engineering prod manager. At the same time he is part of test team and has a reporting structure different from that of the project team.

The benefit of this approach according to Google is “The benefit of the separate reporting structure is that it provides a forum for testers to share information. Good testing ideas migrate easily within Eng Prod giving all testers, no matter their product ties, access to the best technology within the company”

Who Owns Quality

Before discussing further on the test roles, it is fundamentally important to understand the question of “who owns quality?” According to James “At Google it’s the product teams that own quality, not testers. Every developer is expected to do their own testing. The job of the tester is to make sure they have the automation infrastructure and enabling processes that support this self reliance. Testers enable developers to test.”

Testers are facilitators and hence they are productivity enhancers to the development team as against quality gates/managers as expected by most of the development teams around the world. If coding is not right and customer is not happy, it is always easy to point at a faulty testing amidst tight timelines and in general unfair expectations on the test team. The onus of quality needs to reside with the development teams.

Test Roles

Again quoting from Jame’s Blog – “At Google we have created roles in which some engineers are responsible for making others more productive. These engineers often identify themselves as testers but their actual mission is one of productivity. They exist to make developers more productive and quality is a large part of that productivity” Having said that the typical roles that are referenced are:

SWE – Software Engineer- is the typical developer. They create design documents, write code, review code and also write a lot of test code for test driven development and unit tests. In Short “SWEs own quality for everything they touch whether they wrote it, fixed it or modified it.”

SET – “Software Engineer in Test is also a developer role except their focus is on testability. They review designs and look closely at code quality and risk. They re-factor code to make it more testable. SETs write unit testing frameworks and automation. They are a partner in the SWE code base but are more concerned with increasing quality and test coverage than adding new features or increasing performance”

TE or Test Engineer – “is the exact reverse of the SET. It is a role that puts testing first and development second. Many Google TEs spend a good deal of their time writing code in the form of automation scripts and code that drives usage scenarios and even mimics a user. They also organize the testing work of SWEs and SETs, interpret test results and drive test execution, particular in the late stages of a project as the push toward release intensifies. TEs are product experts, quality advisers and analyzers of risk.“

Quality Engineering

It is fairly evident how interplay of the above roles ensures that quality is engineered into the product/code and test is the mechanism to measure the quality that is engineered into the product. Rightly so that is the rationale behind the roles in Google. “Quality cannot be tested in”. “Quality is more an act of prevention than its detection and Quality is a development issue and not a testing issue”.

Good quality results when we are able to successfully embed testing inside development and aim at prevention of bugs and defects creeping into the finished product. Testing is a mechanism to measure how well this prevention method is working.

Incremental Features / Test Driven Development

Google also attributes its success to small incremental releases with a few features and release it to the users the moment it is deemed useful. Then get the feedback from the actual users and keep iterating to perfect the product. Gmail was technically in beta for 4 years and the tag was removed after it reached 99.9% uptime for real users email data. This is what they call as the “crawl, walk and run approach”.

Small, Medium and Large Tests

Another interesting aspect of Google’s approach to testing is instead of distinguishing between Code, Integration and System Testing; it uses the small, medium and large tests emphasizing scope over form. ”

Small Tests are mostly (but not always) automated and exercise the code within a single function or module.

Medium Tests can be automated or manual and involve two or more features and specifically cover the interaction between those features.

Large Tests cover three or more (usually more) features and represent real user scenarios to the extent possible”

Google also does a great deal of manual testing both scripted and exploratory but with a focus on test automation at the earliest instance. The fundamental rule for automation is “If it can be automated and the problem doesn’t require human cleverness and intuition, then it should be automated.” Up until this I have covered 5 parts of the 7 part series the final 2 parts deal with the life of a SET and a TE and as per the blog author, the role definition is still evolving.

Going by the fact that it is a product company that is technology intensive, so far they were able to successfully build quality into their product. But as they diversify into new areas it remains to be seen how much of this can be adopted successfully into other areas of software development. A couple of things that were evident is that the Google operates with too few TE’s and also they are NOT inducted early into the development cycle but could be inducted anytime at any point in time. This is left to the product manager to decide as he owns the quality.

Verification answers the question “Are we building the product right?” and validation answers the question “Are we building the right product?”. The SET role and quality engineering approach of Google can definitely justify the verification aspect but the validation aspect is brought in via the incremental features released to end users. The V model allowed the test engineers to work with business and review the designs and specs so that they meet the customer requirements, however agile / iterative model adopted in google seems to do away with the early tester involvement requirement, instead a lot of effort is spent on trial and error code that needs to crawl, walk and run before it can be handed over the users.

This is fairly evident when James makes the following statement “Test Engineers have little to do early in the development cycle when features are still in flux and the final feature list and scope is undetermined.” This goes against the fundamental belief of how QA/testing is traditionally approached. In many situations the actual end users of the product never have the required time to sit with the product developers hence the BA and the tester filled their role in early involvement, review of specs and design to cover the question of “validation”. Even if the product development team has the time and budget to make 100 releases the business and end users do not have the time to look at each increment and also visualize the end product and make their comments.

There definitely are a lot of things to learn from Google and their approach to quality engineering-particularly the “software engineer in test” stands out as a key and integral role in engineering quality along with their crawl, walk and run approach. However they do miss out on the validation part and rightly so as they are mostly a product engineering company working on high technology intensive products. When they deal with business applications like finance or retailing, they will face validation gaps and that will needs to be addressed. Otherwise it was an interesting read to me and if you do feel the same drop a couple of lines with your thoughts.

I look forward to your comments.

Why test websites across Multiple Browsers?

Time and again with the browser wars taking steam with IE, Firefox, Mac Safari or Google chrome and there is also an increasing need for web content publishers to ensure that content delivered over the internet is viewed properly across various combinations of browsers.

if the content delivered is  not rendered well – the viewership of the  websites will come down. It is very difficult to keep the interest of the surfer by more than a few seconds to a few minutes and unless the experience is good and found useful- nay you lost a reader and also some potential revenue in terms of hit count or advertising revenue which the surfer (web user) might have clicked.

Today most print content providers have a digital business unit that aims to increase viewership / membership to make do for losses in the print side of the business. The revenue model is mostly driven by online advertising and many players have seen good success and growth. This has made them look towards making the websites more sticky and one good technique is to integrate with social media websites like face book, twitter etc.

Today content rendered is dynamic and include multiple formats like video, audio files, text, hyperlinks, blogs and RSS feeds couple with integration with widgets and 3d party applications for social networking integration or with hosting platforms like akamai for scalability.

This makes testing a website in production extremely complex – Sign on into the website and partner website, correct rendering of articles from content management systems, good quality of digital media – audio / video, proper working of widgets, correct and timely display of advertisements and all this is now coupled with the need to test these various combinations over a multiple operating system / browser combination.

Well it can be argued that why test with all combinations? We can always go with the majority of the combinations   which would lead to windows /IE combination or a Safari /Mac combination.

The figure below shows the browser market share as of today:

Image 

 http://www.statowl.com/web_browser_market_share_trend.php

However this means we are ignoring a good segment of users in Chrome or Firefox or Opera or any other new emerging web browser and this means loss of revenue. It is better to target 99% of browsers than look for 70% of browser combination. The primary change that is required is the development platform or approach first, and then understand the need for testing across multiple browsers and operating system combinations.

Having given the business case, we can look at some tips for testing across multiple browsers. There are two primary approaches to this based on your budget availability:

a)      Hard & dirty way

b)      Automated way

The hard & dirty way basically requires that you will install multiple versions of the browser in one machine and replicate all the versions across each operating system. This would require multiple physical machines or a clever virtualization option.

This option is fine when the budget is limited and will require installation and un-installation across the machines due to software compatibility issues between browser, OS and any critical component like Flash or quick time or any of the other widgets and media players used in the website. This gets more complex when the website supports flash based animation and requires extensive manual testing which is laborious as well. It takes more time to test this. Frequent installation and un-installation will result in the browsers getting corrupt or the OS getting corrupt in the registry. More than that in many organizations testers do not have admin access and they will need to wait on the IT support team to come and do the installation every time something needs to be changed. This creates a lot of delays in getting it tested properly and on time.

The other better way is the automated way, which would be tool based.  There are several popular tools available like Browser Shots, Egg Plant, Multi-Browser viewer, Selenium and Gomez to name a few. These tools can provide automated verification and a one-time investment to create the automation scripts will result in rich dividends.  Post the one time investment you will require only a part-time or FTE for enrichment / maintenance of the automation scripts. This also ensures that testing is 90% automated, defect reporting is automated via email and retesting is faster as well. You can run an isolated test or a comprehensive regression depending on the changes to the website. Using a suitable automation framework like keyword or data driven, hybrid frameworks will ensure a good automation code library for quick turnaround of new automation scripts + easy maintenance. A well written framework can even allow a business user alone to run the automation scripts with minimal intervention from the automation test developer.

Having looked at the two approaches- the key to a successful Cross Browser testing is the browser/OS combination matrix that needs to be carefully looked into. This matrix can drastically increase or decrease the number of test cases required. A technique that comes to mind is the pair wise testing technique that could be applied for selecting the minimum combinations that ensures full coverage.

In addition focus should be there to ensure that all navigational elements, content alignment and font, search functions, widgets, 3d party integration websites – like social networking, games all work to expectations across all the browser combinations. Some websites may also require multi-language capabilities and hence checking the correct appearance of language fonts based on appropriate user profile settings is also important.

Some of the common problems that do occur in web site testing are more than one content management system in place. This results in confusion of what content is rendered from where, when you don’t have clear and controlled processes for development (did I hear Agile?)

Next, videos uploaded may not be verified properly for resolution, size, duration and quality. The images may also not display properly due to cache management issues.

Not having a clear segregation between the content & producer team and the technical team will always result in more bugs as there is always an overlap of responsibilities due to lack of clear processes in place. Predominantly the issues that can occur can be traced back to content related or a component related. Component related issues need to be addressed by development teams and content related by the authoring team or C&P team.

Finally all UI issues can be automated via tools like selenium or QTP, if you chose the automated way. However we need to bear in mind that QTP may not be available in Mac OS and hence preferred to go the selenium or other open source tools way for this cross browser testing.

So  if you have a hot web property and have decided to increase its reach to more audience above and beyond the 50% market share that IE provides, you definitely need to look into cross browser testing. Unless you take a professional approach very soon your development and content teams will end up in a mess and pile up a high amount of technical debt that will take forever to clean up and will cost lot more money than investing a bit towards preventive and appraisal cost, this will always lower your overall cost of quality.

The Vision

Hi folks

Welcome to Balaraman’s Blog world. I intend to share my experiences, thougts , future and many other aspects of the software quality assurance and testing field. As I go along i will be sharing my research in this area – problems and challenges faced in specific types of engagments and how it was overcome and what were the lessons learned for the benefit of the software testing community.. Happy reading  and if you like something would be glad to hear your words of encouragement if not your feedback for improvement.

Thanks for reading.

Bala