|
The Articulate Community Forums have moved! Head over to E-Learning Heroes, your new Articulate community site, where you'll find the new forums and a whole lot more! Signup is free. The forums you see here will remain open for browsing, but are no longer open to new posts. |
|
|
LinkBack | Thread Tools | Search this Thread |
|
|
# 1 | ||
|
Administrator
![]() ![]() Join Date: Feb 2006
Location: Saratoga Springs, NY
Posts: 2,338
|
At approximately 8:30am EST this morning, all Articulate Online hosted accounts went offline. All accounts were back online as of about 4:30pm EST. The problem was related to a network/IP address configuration issue at our data center. The good news: No data was lost and no accounts were compromised.
We’re Very Sorry We’re really sorry for this downtime. We know you and your customers count on us for your elearning programs, and we hate to let you down like we did today. We accept full responsibility for what happened. We’re investigating the technical specifics of what went wrong and are really trying to understand what caused this problem. We’ll also be evaluating our disaster preparedness plan to avoid something like this in the future. What Went Wrong Here’s the rough timeline of events that we experienced today:
We know how frustrating it is when a product you depend on to run your business is suddenly not there. So thank you for hanging in there with us. This was an important experience for us that we’re taking very seriously. We’re committed to improving our server infrastructure to minimize the chances of this happening again. In the past year since AO went live, we’ve experienced minimal downtime, most of which was a result of brief, planned downtimes for product upgrades. As always, please feel free to contact me directly with any specific questions or concerns (or discuss your concerns in this forum thread).
__________________
Gabe Anderson Articulate Director of Customer Advocacy E-Learning Heroes - NEW forums, tutorials, downloads, and more! Last edited by gabe : 01-15-2008 at 06:17 PM. |
||
|
|
|
|
# 2 | ||
|
Member
Join Date: Jan 2008
Posts: 3
|
Hi Gabe,
Thanks for your explanation of today's events. While we're a relatively new customer, we have invested significant financial and manpower resources in implementing your system. And, unfortunately, today's downtime has caused us significant hardship with our largest client, as over 100 of their staff members attempted to take an online certification exam today and were unable to do so. I understand that downtime occasionally happens and cannot be avoided. I also realize that your primary objective was to get the service up and running again. That being said, because we depend on your product to support our clients, I have a number of specific questions I would like to see addressed. Rather than sending an email, I these questions would be best addressed in the forums, as your other customers may have similar questions. Thank you in advance for addressing these questions/issues fully. Please explain specifically what DNS issue lead to this downtime and what is being done to ensure it is not repeated. You state the downtime was noted at 8:30 AM and the issue was not identified until 11:30 AM, which is quite a long time. Why the delay? Why was an email not sent to all of your customers as soon as the server issue was identified? Here we are 12 hours later, and your I STILL have not received an email. While it's great that you post in the forums, it shouldn't be the only communication vehicle--I'm sure many of your customers don't frequent the forums on a regular basis. I inferred from your email that Articulate Online is running on a single server. (Your post said: "primary Articulate Online server, where all hosted accounts reside") Is that correct? I would expect that when purchasing a SAAS such as Articulate Online that the service provider is using multiple load-balanced servers in a cluster, with a hot standby cluster, preferably at a different data center. Could you please explain your infrastructure? Why did it take such a long time to redirect people trying to access to the portal to a static web page saying the site was down? |
||
|
|
|
|
# 3 | ||
|
Member
Join Date: Dec 2007
Posts: 215
|
I'm still trying to publish something and it says that it cannot connect to Articulate Online. My internet connection is working. Is this related to today's downtime.
Very frustrated... |
||
|
|
|
|
# 4 | ||
|
Member
Join Date: Jan 2008
Posts: 3
|
It appears as though the site is down again. Anyone else experiencing this?
|
||
|
|
|
|
# 5 | ||
|
Member
Join Date: Feb 2006
Posts: 109
|
I just testing publishing and viewing content and everything looks fine for me. What are you seeing?
|
||
|
|
|
|
# 6 | ||
|
Member
Join Date: Jan 2008
Posts: 3
|
Seems to be working okay again. Nevermind.
|
||
|
|
|
|
# 7 | ||
|
Moderator
![]() Join Date: Feb 2006
Location: Boulder, CO
Posts: 985
|
Hello Jtarnoff,
My name is Dave Mozealous and I am the QA Project Lead for Articulate Online. First and foremost, I want to apologize for the downtime that happened today. Although your questions were directed at Gabe, I wanted to take the time to respond to the questions as I was one of the people responsible getting Articulate Online back online as soon as possible. > Rather than sending an email, I these questions would be best addressed in the forums, as your other customers may have similar questions. Great, no problem. One of the reasons we decided to post in the forums about this is so that we could be open an honest about the issue and allow any feedback from any concerned customers. > Please explain specifically what DNS issue lead to this downtime and what is being done to ensure it is not repeated. Basically what happened was that another server was brought online in our datacenter today that had a matching MAC address/IP address as our primary Articulate Online IP. This caused traffic that was being routed to Articulate Online to never make it past the switch. > You state the downtime was noted at 8:30 AM and the issue was not identified until 11:30 AM, which is quite a long time. Why the delay? It took the network engineers in our datacenter longer than anticipated to figure out that this was causing the issue, and once the issue was identified, it took longer than anticipated to identify the server that was causing the problem. I am extremely disappointed that it took as long as it did to identify and correct the cause of the issue, and we are currently working to understand exactly why this happened and what our datacenter can do to prevent it from ever happening again. > Why was an email not sent to all of your customers as soon as the server issue was identified? Here we are 12 hours later, and your I STILL have not received an email. While it's great that you post in the forums, it shouldn't be the only communication vehicle--I'm sure many of your customers don't frequent the forums on a regular basis. To be honest, the reason we didn't send out an email is that we kept thinking that it was only going to be a matter of minutes before we got it up and running again. You are right, notification should have been sent stating that we were experiencing downtime, and going forward we are committed to being more proactive in our communication. We are working on setting up a permanent status page, and an RSS feed that users can access to get a real-time status on Articulate Online. We will have these implemented by the end of the week. > I inferred from your email that Articulate Online is running on a single server. (Your post said: "primary Articulate Online server, where all hosted accounts reside") Is that correct? I would expect that when purchasing a SAAS such as Articulate Online that the service provider is using multiple load-balanced servers in a cluster, with a hot standby cluster, preferably at a different data center. Could you please explain your infrastructure? You are correct. Articulate Online runs on multiple servers with full mirroring. > Why did it take such a long time to redirect people trying to access to the portal to a static web page saying the site was down? The reason it took so long is that in order for us to do this we would have had to change our DNS server to point to a new server where we could have put up a status announcement because traffic wasn't making it to our Articulate Online servers. Changes to a DNS server can take some time to propagate across the internet (up to several hours), so we were worried that changing the DNS to point to a server would only increase the amount of time before we could get Articulate Online back online. Once we had identified the issue we truly thought we were only 20 minutes away from fixing the issue, so were concerned that changing something at the DNS level would cause hours of unavailibity for something that should be fixed in only a couple of minutes. Once it became apparent that it was going to take longer than anticipated to get the site up and running we proceeded with the DNS change. Once again, I am very sorry for the downtime today. We are very serious about preventing anything like this from happening in the future, and we look forward to regaining your trust in us. If you or anyone else would like to speak to me about today's issue, feel free to PM with your phone number and I'll personally call you in the morning.
__________________
Dave Mozealous Quality Assurance Manager Visit my blog @ http://www.mozealous.com You should follow me on twitter here. |
||
|
|
|
|
# 8 | ||
|
Administrator
![]() ![]() Join Date: Feb 2006
Location: Saratoga Springs, NY
Posts: 2,338
|
Hi jtarnoff-
As I mentioned in my email to you a bit ago, I want to echo Dave's comments here that we're very sorry for the trouble today and can assure you that we will do everything we can to prevent this from happening again. I'm very sorry that the timing of this was so bad for you with your planned launch today. We'll keep you posted on the measures we're taking to improve the AO service monitoring, as well as how we'll handle notification of any future issues.
__________________
Gabe Anderson Articulate Director of Customer Advocacy E-Learning Heroes - NEW forums, tutorials, downloads, and more! |
||
|
|
|
|
# 9 | ||
|
Member
Join Date: Aug 2006
Posts: 85
|
Could you please tell us why the domain (or at least my domain) was at some times pointing to an online TV company with what appeared to be some kind of flash website? (this was when the https:// domain was used.
|
||
|
|
|
|
# 10 | ||
|
Moderator
![]() Join Date: Feb 2006
Location: Boulder, CO
Posts: 985
|
Hello awkeogh,
>Could you please tell us why the domain (or at least my domain) was at some times pointing to an online TV company with what appeared to be some kind of flash website? (this was when the https:// domain was used. The reason you were seeing this is that the website in question was the server that was brought up with the conflicting IP/Mac address, and this caused some of the traffic to be routed to the wrong machine.
__________________
Dave Mozealous Quality Assurance Manager Visit my blog @ http://www.mozealous.com You should follow me on twitter here. Last edited by dmozealous : 01-16-2008 at 06:37 AM. Reason: provided more detail. |
||
|
|
| Thread Tools | Search this Thread |
|
|