Status

March 20th, 2014 - MaxCDN Failure Causes Widespread Problems

Today we had problems with our website that were caused by our "Content Distribution Network (CDN)" provider, a company called MaxCDN.  Last weekend we started using MaxCDN as a way to increase the speed and (in theory) the reliability of our website.  A CDN is a network of computers that copies parts of our website and puts those copies on servers located all over the world so that the content is quicker to download.

After testing MaxCDN's services extensively, we rolled it out on our site earlier this week.  We did the roll-out in stages being careful to test things at each stage.  Everything went great until this morning when we rolled out the final piece of the puzzle.  As soon as that part was added, all heck broke loose and MaxCDN's servers suddenly stopped sending out ANY of our files.  We have been scrambling to recover ever since.

The part we added this morning was really very innocent - we just checked a checkbox telling MaxCDN to add a server in Singapore to our service so that people in Asia would get our pages faster.  Evidently, that was a bad idea.

The first problem happened at 9am Eastern.  We immediately took steps to disable (but not remove) our use of MaxCDN.  We then started waiting for an explanation from them as to what exactly went wrong and why.  At that point we hoped that they would tell us that this was a one-time unusual circumstance and that we could re-enable the performance improvements that MaxCDN provided.

Unfortunately, at 1PM Eastern, the problem suddenly reappeared.  At that point, we threw in the towel and removed MaxCDN from our system completely.

If you are still seeing problems with our website, you may need to clear your browser's cache and restart your computer to get rid of any lasting vestiges of the old code.

Again, we are always looking for ways to make our service faster and more reliable.  At some point, we will add a reliable CDN to our system.  We thought we had found a good one.  We will keep looking.  Our apologies for whatever issues today's problems have caused.

- Chip

 

UPDATE 3/21 - The people at MaxCDN are working very hard to try and win us back and we appreciate their efforts.  They have explained that we somehow hit a very rare problem in their system and they have fixed that problem.  They are making sure that our account is properly configured by inspecting it with a fine toothed comb.  Once that process is complete, we will start testing things again _very_ slowly and carefully.  Hopefully, things will go more smoothly this time.  We'll keep you up-to-date.

UPDATE 3/25 - After MaxCDN's technical boffins gave us the all clear, we have gradually re-added our static content back onto their network and so far so good.  People throughout the world are now getting those file much faster than before and that in term means our datacenter is able to send out our charts more efficiently.  Knock on wood, but it appears that our 2nd try at rolling things out has worked.

Feb. 18th, 2014 @ 11:20am - Denial of Service Attack on our ISP Affecting StockCharts.com

5:00pm Update:  Internap has reported back to us with the following information:

  • StockCharts.com was not the target of the Denial-of-Service attack.  We were "collateral damage."
  • The problem happens when the DOS attack swamps Internap's connection to the Cogent backbone (one of several backbones they use).
  • People who are accessing our site via that backbone lost their connections pretty much immediately.
  • Unfortunately, that slow backbone eventually causes our servers to "fill up" with charts waiting to go out.  Once that happens our site stops working for everyone.
  • This latest attack lasted much longer (about 1 hour) than the previous ones.
  • Internap installed new software and new procedures today to help cope with future attacks.
  • Internap is prepared to move us off of the Cogent backbone if this issue happens again.
  • Internap is investing in a long-term solution that will completely eliminate this issue in the future.  That solution will be in place "in the summer."

We are satisfied that these steps will help mitigate the effects of future attacks.  If it does happen again, we will ask Internap to move us off of the Cogent backbone.

 

12:50am Update:  Internap is reporting that everything is back to normal and working again.  We don't agree.  Because this is the third time this has happened in the past year, we have notified their Senior VP of Technology about this issue.  We'll be following up with them aggressively to try and determine why this continues to happen.

 

11:45am Update:  Things are finally starting to work again.  We are continuing to monitor the situation and are still waiting for a response from Internap on why this continues to happen.  We will post an update again after we hear back from them.

 

11:20am: Around 10:45am Eastern, we started seeing problems with the delivery of our charts to our users.  

At 11:15am, we received the following message from our ISP:

At approximately 07:54 PST our network monitoring system notified us of an increase in traffic over our upstream provider, Cogent, in our SEF PNAP location. The increase in traffic is due to a DoS. Customer traffic inbound over Cogent would experiencing packet loss and latency during this time. At this time we are investigating the issue and will update customers once the traffic has.

We are currently still seeing issues with our service.  We have called our ISP about this issue and they are working on resolving it.  We apologize for the problems this issue has caused our users.  Once service is restored, we will be contacting Internap and asking for a better solution to these kind of situations.

Feb 10th, 2014 @ 11:10am - ISP Issue with Cogent Backbone

Feb. 10th, 2014 @ 11:30am - When it rains it pours (and yes, it is pouring rain in Seattle this morning).  Completely independent from our login/database issue earlier today, our ISP has just reported that they had to reroute traffic from one of the major Internet "backbones" that connect them to the rest of the world.  This appears to have affected about 40% of our users.  The outage appears to have occured over the period from 11:10am to 11:15am Eastern time.  Things appear to be back to normal now.

Again, if you are unable to connect to our website or our website is slow, be sure to check our status blog:
    http://blogs.stockcharts.com/status

and also our Pingdom Uptime report:
    http://stats.pingdom.com/g0fdmqv6vgnb

Feb 10th, 2014 @ 8:30am - Clogged Database Preventing Logins and Account Changes

Feb 10th, 2014 @ 8:30am - One of our databases is not responding and that is causing all logins to be rejected.  Our data team is investigating.  Thank you for your patience.

Update: 9:00am - The database is question is fine, just not allowing updates until we clear out some more space.  Things should be back to normal in about 30 minutes.

Update: 9:30am - Down to the wire.  We are almost there but it looks like we will miss the market open.  Our apologies for that.

Update: 9:40am - Things are working again!  We will continue to monitor the situation closely.

Update: 9:45am - We think things are much better.  Our data team is continuing to investigate to find the root cause of this issue and make sure it doesn't happen again.  Thanks again for your patience.

By the way, as you might imagine, our support queues were swamped this morning.  If you sent in a support request this morning about this issue, thank you but unfortunately we will not be able to reply individually.  If you sent in a support request this morning about an unrelated issue, you probably should send it in again.

Dec. 19, 2013 @ 12pm - DDoS Attack at ISP Interrupted Service

At 12:05pm Eastern time we noticed a big drop in traffic coming to our website.  People located outside of our offices soon reported slowness with the website and soon they were unable to connect.  An initial investigation showed no problems with our systems.  We posted some notifications on Twitter (@stockchartscom) about the situations as we continued to diagnose the issue.  At 12:27pm we received a message from InterNAP - our ISP - informing us that they had experienced a large Denial of Service attack and had taken steps to mitegate its effects.  Soon after that, our systems returned to normal.

As of now - 12:45pm Eastern - things are working properly.  We will continue to monitor the situation closely.

We apologize for whatever inconvenience this issue has caused however we are at the mercy of our ISP when situations like this occur.  Given the unpredictable nature of these kind of attacks, we believe that InterNAP handled things as quickly and as effectively as possible.  We will follow up with them to make sure they are taking steps to further reduce the possiblity of future attacks.

Jan 4th @ 2pm - Networking Issues with our ISP - FIXED

UPDATE @ 3:30PM:
From our ISP:

"We have found and addressed a DDOS attack on our network at approximately 10:52 PST.  Due to the extreme size of the attack providers experienced periods of increased utilization and caused some packet loss and latency within the PNAP.  We have been monitoring all provider links for stability and have seen no further issues since the initial attack was resolved.  We will continue to monitor going forward and update if there are any further issues."

We, StockCharts.com, apologize for the interuption that this event caused.  We will be following up with InterNAP to see if they can update their procedures to prevent this kind of thing from happening again.  A DDOS attack on some other website should NOT effect us like this.

Again, thank you for your understanding and support.
  - Chip

 

ORIGINAL POST:
We just received the following notice from InterNAP, our ISP:

"Our routing team is investigating an issue that is affecting multiple circuits in our Seattle data center locations. We are currently investigating this issue and will send updates as we receive them."

This notice coincides with multiple reports from our users of trouble accessing our website.  We don't believe that the problem affects everyone equally.  Check back here for updates throughout the day.

Aug 20th@12:00pm - Our ISP's Backbone Dropouts Causing Slowness (FIXED)

4:30pm Eastern:  We haven't received any more messages from our ISP - no more slowness "episodes" either.  We suspect that they will get the 2 broken backbones fixed and switch things back to normal around midnight tonight.  There may be a bump or two when that happens.  It is possible that some residual slowness will remain for a small number of users over the next day or so as the Internet "re-balances" the traffic routes for our website.  Let our support team know if you see significant issues with speed after tomorrow.

Unfortunately, we may never know the real culprit behind the outage but I'm guessing it was either a bad router, a guy with a backhoe, or a system administrator deploying a buggy patch.  This is the first time in many years that our ISP has caused a noticable outage like this.  We will follow up and make sure that they will take steps to prevent these kind of problems in the future.

This will probably (hopefully) be our last update on this incident.  Thanks again for your patience and understanding.

2:00pm Eastern: Message from our ISP - "At 10:30 PDT our our third level engineer shut down our BGP session with our upstream provider XO. Customer would have noticed routing reconvergence and sub-optimal routing as traffic was re-routed over our other available providers at the SEF PNAP.  The XO BGP session will remain down until we notify customer prior to the restoring of the BGP session."

Translation:  Our ISP uses multiple backbones to connect to the Internet.  They've identified two of those backbones (Sprint and XO) as having problems this morning for unknown reasons.  They have moved our traffic off of those two problem backbones and onto other backbones.  That means that for many people, the extreme slowness should be gone, however some people will see slower that normal charts until they get those two misbehaving backbones working again.

Again, thanks for your patience as they sort through all these issues.

 

1:25pm Eastern: Message from our ISP - "We have shut down our BGP session with our Sprint provider as of 10:10 PDT. We will send out a notification to customers prior to restoring the BGP session with Sprint. Our third level engineers are still investigating the over utilization issue with our Sprint provider and we will provide additional information once it is available."

We - StockCharts - noticed a significant improvement in response times around the time that they turned off Sprint.  Hopefully things are better for many of you now.  We will continue to keep you updated.

1:07pm Eastern:  Message from our ISP - "Our third level engineers are continuing to investigate the issue for resolution. We have vendor tickets open as well to assist in the investigation. At this time we do not have an ETTR however we are working as quickly as possible for the issue to be resolved We will provide additional information as soon as it is available." 

(ETTR - "Estimated Time To Resolution")

12:40pm Eastern: We are experiencing another "episode" at the moment.  Thank you for your patience.  We will pass along any more news as we receive it from our ISP.

12:00pm Eastern: This morning our ISP is reporting problems with one of their backbone connections to the Internet.  InterNAP has told us that they have experienced 2 big "packet loss" events so far - one around 10:40am Eastern and a second one around 11:50am Eastern.  They are re-routing traffic and continuing to investigate.

The issues appear to have affected a large number of our users with slow response times and missing charts.  For that we apologize.  At this point things are working fine but we can't be sure that it won't happen again because the issue is essentially "out of our hands."  We expect and believe that  InterNAP will have things fixed shortly.

Check back here for updates over the course of the day.

Feb. 23, 2012 - FIXED: Annotations Occasionally Not Displayed

12:00pm - The problem has been fixed.  All annotations should be appearing again reliably.  Again, sorry for whatever problems this issue has caused.

 

10:45am - Annotations are not always being displayed on their charts this morning.  The annotations themselves are fine - they are still stored in our database.  It's just that our charting program is not always including them on the charts.  It seems to be happening about once every 10 times a chart is drawn (although that is an average and it could be more or less frequent at any one time).

We are working on tracking down the problem and expect to have it fixed well before the end of the day.

In the mean time, if you see a chart that doesn't have the annotations you expect, just refresh you page and the annotations should appear.

We apologize for whatever inconvenience this has caused.

Late Trade Activity on Oct. 27th Caused Inconsistent Closing Values for Some NYSE Stocks

We have been investigating reports from users that our data for the end trading on Thursday, October 27th did not match other sources for several stocks and ETFs.  On our own, we discovered that while our data matched some public sources such as Bloomberg.com and Yahoo.com's Historical data page, it did not match other sources including Google Finance and Yahoo.com's main quote page.

While we were in the middle of getting to the bottom of this conundrum, we received the following message from our data vendor:

"Please be advised that due to late trade activity, there may be inconsistancies in Composite Close from Pacific Exchange for an unknown number of securities. We are currently investigating this with the NYSE and will provide further information as it becomes available.  Please see the attached Trader Update link posted by NYSE.
http://www.nyse.com/pdfs/NOTICE_Duplicated_Extended_Session_Prints_on_October_27_2011.pdf "

The bottom line is that something screwed up the closing data for an "unknown number" of NYSE stocks yesterday and that screw up affected pretty much everyone.  At this point we do not know if our closing data for yesterday is correct or not.  We expect that at some point later today, we will get an update with the new "correct" numbers from the NYSE.  At that point, we will update our databases and the charts may change as a result.

Sorry for the problem, but the issue is "upstream" from us, apparently at the NYSE itself.

March 28, 2011 - Intraday Data Offset by One Hour - Now Fixed

Our programming team made a mistake last month when they updated our servers to support London stocks.  The mistake was assuming that London time was the same a Greenwich Mean Time (GMT) which is what our datafeed uses for its time fields.  Up until Sunday, London time actually was the same as GMT, but that was just a coincidence.  On Sunday, London time changed to be one hour different from GMT and that is what caused our system to display incorrect intraday charts this morning.

At this point, most of the problems have been fixed (LSE stocks will be corrected shortly).  The programmers have been re-educated about time zones.  And, because this was due to a mistake on our part, all current members have had a free day of service added to their account.

Other StockCharts Blogs