Wednesday, May 23, 2012

Google App Engine issue on May 23 and 24,2012 - High latencies

This was not a problem with all sites hosted with google app engine. There were not that huge response when some one reported that the response time on some sites hosted on GAE is responding much slower than as usual , nearly 20-30 times slower . So this issue went as it is on May 23rd. When more people stated this issue, google started looking this seriously. So these are the issues
a)  Front end instance hours increased by 3 to 4 times
b) There were quata denilels for billing-enabled application - 403 Forbidden
c) 500 errors orother error codes with no explanation

Most of the users tried to analysis , is there anything fishy to there code, they changed there code to previous version , flush the cache memory and took help from paid python experts etc. No one was able to find the reason . The other sites hosted with same account have no issue. So it was really a hetic day for the developers. There were also some doubt as they release there new 1.6.6 release, is this issue because of that or not ? After 4hrs on public post google engineers started looking that issue.

Even after migrating the app to High Replication Datastore (HRD) , the problem still occurs. They thought that putting one massive object into memcache, this can make some issue. The developers stated playing with memcache , but no change. Ideally, the memcache service should have steady latencies, but unfortunately it is not as stable as  HRD. Finally at last after 15 hr , google engineer informed that "This seems related to an infrastructure issue that affected a small portion of python master slave application during the 166 rollout.A temporary workaround has been put in place and the reliability team is still investigating for a long term solution."

Thanks to Google Engineers. Hoping to have a permanat solution on this