1. Articles from Perspectives

    perspectives.mvdirona.com

    1-22 of 22
    1. Spot Instances, Big Clusters, & the Cloud at Work

      Explore Perspectives (Sep 20 2011)

      Spot Instances, Big Clusters, & the Cloud at Work If you read this blog in the past, you’ll know I view cloud computing as a game changer (Private Clouds are not the Future) and spot instances as a particularly powerful innovation within cloud computing. Over the years, I’ve enumerated many of the advantages of cloud computing over private infrastructure deployments. A particularly powerful cloud computing advantage is driven by noting that when combining a large number of non-correlated workloads, the overall infrastructure utilization is far higher for most workload combinations.  This is partly because the reserve capacity to ensure that all workloads are able to support peak workload demands is a tiny fraction of what is required to provide reserve surge capacity for each job individually.   This factor alone is a huge gain but an even bigger gain can be found by noting that all workloads are cyclic and go through sinusoidal capacity peaks and troughs. Some ... (Read Full Article)

      Comment Mentions:   Amazon.com   James Hamilton

    2. 2011 European Data Center Summit

      Explore Perspectives (May 25 2011)

      2011 European Data Center Summit The European Data Center Summit 2011 was held yesterday at SihlCity CinCenter in Zurich. Google Senior VP Urs Hoelzle kicked off the event talking about why data center efficiency was important both economically and socially.  He went on to point out that the oft quoted number that US data centers represent is 2% of total energy consumption is usually mis-understood. The actual data point is that 2% of the US energy budget is spent on IT of which the vast majority is client side systems. This is unsurprising but a super important clarification.  The full breakdown of this data:   ·         2% of US power o   Datacenters:              14% o   Telecom:                     37% o   Client Device:            50%   The net is that 14% of 2% or 0.28% of the US power budget is consumed in datacenters.  This is a far smaller but still a very relevant number. In fact, that is the primary motivator ... (Read Full Article)

      Comment Mentions:   James Hamilton

    3. European Data Center Efficiency Summit

      Explore Perspectives (Apr 29 2011)

      Google cordially invites you to participate in a European Summit on sustainable Data Centres. This event will focus on energy-efficiency best practices that can be applied to multi-MW custom-designed facilities, office closets, and everything in between. Google and other industry leaders will present case studies that highlight easy, cost-effective practices to enhance the energy performance of Data Centres. The summit will also include a dedicated session on cooling. Presenters will detail climate-specific implementations of free cooling as well as novel ways to utilise locally -available opportunities. We will also debate climate-independent PUE targets. The agenda includes presentations and panel discussions featuring Amazon, DeepGreen, eBay, Google, IBM, Microsoft, Norman Disney & Young, PlusServer, Telecity Group, The Green Grid, UK's Chartered Institute for IT, UBS and others. Attendance is free. However, space is limited and we therefore encourage you to register online at your earliest convenience. Your participation will be confirmed. We ... (Read Full Article)

      Comment Mentions:   Amazon.com   Google   The Green Grid

    4. More Data on Datacenter Air Side Economization

      Explore Perspectives (Mar 15 2011)

      More Data on Datacenter Air Side Economization Two of the highest leverage datacenter efficiency improving techniques currently sweeping the industry are: 1) operating at higher ambient temperatures and air-side economization  with evaporative cooling. The American Society of Heating and Refrigeration, and Air-Conditioning Engineers (ASHRAE) currently recommends that servers not be operated at inlet temperatures beyond 81F. (Read Full Article)

      Comment Mentions:   ASHRAE   James Hamilton

    5. Yahoo! Compute Coop Design

      Explore Perspectives (Mar 5 2011)

      Yahoo! Compute Coop Design Christina Page, Director of Climate & Energy Strategy at Yahoo! spoke at the 2010 Data Center Efficiency Summit where she presented Yahoo! Compute Coop Design. The primary attributes of the Yahoo! design are: 1) 100% free air cooling (no chillers), 2) slab concrete floor, 3) use of wind power to augment air handling units, and 4) pre-engineered building for construction speed. Christina reports the idea to orient the building such that the wind force on the external wall facing the dominant wind direction and use this higher pressure to assist the air handling units was taken from looking at farm buildings in the Buffalo, New York area. An example given was the use of natural cooling in chicken coops. (Read Full Article)

      Comment Mentions:   Yahoo   James Hamilton   Microsoft Corp

    6. Exploring the Limits of Datacenter Temperature

      Explore Perspectives (Feb 27 2011)

      Exploring the Limits of Datacenter Temperature Datacenter temperature has been ramping up rapidly over the last 5 years. In fact, leading operators have been pushing temperatures up so quickly that the American Society of Heating, Refrigeration, and Air-Conditioning recommendations have become a become trailing indicator of what is being done rather than current guidance. ASHRAE responded in January of 2009 by raising the recommended limit from 77F to 80.6F (HVAC Group Says Datacenters Can be Warmer). This was a good move but many of us felt it was late and not nearly a big enough increment.  Earlier this month, ASHRAE announced they are again planning to take action and raise the recommended limit further but haven’t yet announced by how much (ASHRAE: Data Centers Can be Even Warmer).    Many datacenters are operating reliably well in excess even the newest ASHRAE recommended temp of 81F. For example, back in 2009 Microsoft announced they were operating ... (Read Full Article)

      Comment Mentions:   Sun Microsystems   ASHRAE   James Hamilton

    7. Speeding Up Cloud/Server Applications With Flash Memory

      Explore Perspectives (Feb 6 2011)

      Speeding Up Cloud/Server Applications With Flash Memory Last week, Sudipta Sengupta of Microsoft Research dropped by the Amazon Lake Union campus to give a talk on the flash memory work that he and the team at Microsoft Research have been doing over the past year.  Its super interesting work. You may recall Sudipta as one of the co-authors on the VL2 Paper (VL2: A Scalable and Flexible Data Center Network) I mentioned last October.   Sudipta’s slides for the flash memory talk are posted at Speeding Up Cloud/Server Applications With Flash Memory and my rough notes follow: ·         Technology has been used in client devices for more than a decade ·         Server side usage more recent and the difference between hard disk drive and flash characterizes brings some challenges that need to be managed in the on-device Flash Translation Layer (FTL)  or in the operating systems or Application layers. ·         Server requirements are more aggressive across several dimensions including ... (Read Full Article)

      Comment Mentions:   Amazon.com   Microsoft Research   James Hamilton

    8. Datacenter Networks are in my Way

      Explore Perspectives (Oct 31 2010)

      Datacenter Networks are in my Way I did a talk earlier this week on the sea change currently taking place in datacenter networks. In Datacenter Networks are in my Way I start with an overview of where the costs are in a high scale datacenter. With that backdrop, we note that networks are fairly low power consumers relative to the total facility consumption and not even close to the dominant cost. Are they actually a problem? The rest of the talk is arguing networks are actually a huge problem across the board including cost and power. Overall, networking gear lags behind the rest of the high-scale infrastructure world, block many key innovations, and actually are both cost and power problems when we look deeper.   The overall talk agenda: ·         Datacenter Economics ·         Is Net Gear Really the Problem? ·         Workload Placement Restrictions ·         Hierarchical & Over-Subscribed ·         Net Gear: SUV of the Data Center ·         Mainframe Business Model ·         Manually Configured & Fragile at Scale ... (Read Full Article)

      Comment Mentions:   Cisco   James Hamilton

    9. Netflix Migration to the Cloud

      Explore Perspectives (Oct 10 2010)

      Netflix Migration to the Cloud This morning I came across an article written by  Sid Anand, an architect at Netflix that is super interesting. I liked it for two reasons: 1) it talks about the move of substantial portions of a high-scale web site to the cloud, some of how it was done, and why it was done, and 2) its gives best practices on AWS SimpleDB usage.   I love articles about how high scale systems work. Some past postings: FriendFeed use of MySQL Facebook Cassandra Architecture and Design Wikipedia Architecture MySpace Architecture and .Net Flickr DB Architecture Geo-Replication at Facebook Scaling at LucasFilms Facebook: Needle in a Haystack: Efficient Storage of Billions of Photos Scaling LinkedIn Scaling at MySpace   The article starts off by explaining why Netflix decided to move their infrastructure to the cloud:   Circa late 2008, Netflix had a single data center. This single data center raised a few concerns. As a ... (Read Full Article)

      Comment Mentions:   Amazon.com   Oracle   James Hamilton

    10. Scaling AWS Relational Database Service

      Explore Perspectives (Oct 9 2010)

      Hosting multiple MySQL engines with MySQL Replication between them is a common design pattern for scaling read-heavy MySQL workloads. As with all scaling techniques, there are workloads for which it works very well but there are also potential issues that need to be understood. In this case, all write traffic is directed to the primary server and, consequently is not scaled which is why this technique works best for workloads heavily skewed towards reads. But, for those fairly common read heavy workloads, the techniques works very well and allows scaling the read workload across over a fleet of MySQL instances.  Of course, as with any asynchronous replication scheme, the read replicas are not transactionally updated. So any application running on MySQL read replica’s must be tolerant of eventually consistent updates.   Load balancing high read traffic over multiple MySQL instances works very well but this is only one of the ... (Read Full Article)

      Comment Mentions:   Amazon.com   James Hamilton   Facebook

    11. Overall Data Center Costs

      Explore Perspectives (Sep 18 2010)

      Overall Data Center Costs A couple of years ago, I did a detailed look at where the costs are in a modern , high-scale data center. The primary motivation behind bringing all the costs together was to understand where the problems are and find those easiest to address. Predictably, when I first brought these numbers together, a few data points just leapt off the page: 1) at scale, servers dominate overall costs, and 2) mechanical system cost and power consumption seems unreasonably high. Both of these areas have proven to be important technology areas to focus upon and there has been considerable industry-wide innovation particularly in cooling efficiency over the last couple of years.   I posted the original model at the Cost of Power in Large-Scale Data Centers. One of the reasons I posted it was to debunk the often repeated phrase “power is the dominate cost in a large-scale data center”. Servers dominate with ... (Read Full Article)

      Comment Mentions:   Google   James Hamilton

    12. Energy Proportional Datacenter Networks

      Explore Perspectives (Aug 1 2010)

      Energy Proportional Datacenter Networks A couple of weeks back Greg Linden sent me an interesting paper called Energy Proportional Datacenter Networks. The principal of energy proportionality was first coined by Luiz Barroso and Urs Hölzle in an excellent paper titled The Case for Energy-Proportional Computing. The core principal behind energy proportionality is that computing equipment should consume power in proportion to their utilization level. For example, a computing component that consumes N watts at full load, should consume X/100*N Watts when running at X% load. This may seem like a obviously important concept but, when the idea was first proposed back in 2007, it was not uncommon for a server running at 0% load to be consuming 80% of full load power. Even today, you can occasionally find servers that poor. The incredibly difficulty of maintaining near 100% server utilization makes energy proportionality a particularly important concept.                                                                                                                           One of the wonderful aspects ... (Read Full Article)

      Comment Mentions:   Cisco   James Hamilton

    13. The New World Order

      Explore Perspectives (Jun 7 2010)

      The New World Order Industry trends come and go. The ones that stay with us and have lasting impact are those that fundamentally change the cost equation. Public clouds clearly pass this test. The potential savings approach 10x and, in cost sensitive industry, those that move to the cloud fastest will have a substantial cost advantage over those that don’t.   And, as much as I like saving money, the much more important game changer is speed of execution. Those companies depending upon public clouds will noticeably more nimble. Project approval to delivery times fall dramatically when there is no capital expense to be approved. When the financial risk of new projects is small, riskier projects can be tried. The pace of innovation increases. Companies where innovation is tied the financial approval cycle and the hardware ordering to install lag are at a fundamental disadvantage.   Clouds change companies for the better, clouds drive down ... (Read Full Article)

      Comment Mentions:   James Hamilton

    14. PUE is Still Broken and I still use it

      Explore Perspectives (May 25 2010)

      PUE is Still Broken and I still use it PUE is still broken and I still use it. For more on why TPUE has definite flaws, see: PUE and Total Power Usage Efficiency. However, I still use it because it’s an easy to compute summary of data center efficiency. It can be gamed endlessly but it’s easy to compute and it does provide some value.   Improvements are underway in locking down of the most egregious abuses of PUE. Three were recently summarized in Technical Scribblings  RE Harmonizing Global Metrics for Data Center Energy Efficiency.  In this report from John Stanley, the following were presented: ·         Total energy to include all forms of energy whether electric or otherwise (e.g. gas fired chiller must include chemical energy being employed). I like it but It’ll be a challenge to implement ·         Total energy should include lighting, cooling, and all support infrastructure. We already knew this but its worth clairifying since ... (Read Full Article)

      Comment Mentions:   Data Center Efficiency   James Hamilton

    15. Computer Room Evaporative Cooling

      Explore Perspectives (May 14 2010)

      Computer Room Evaporative Cooling I recently came across a nice data center cooling design by Alan Beresford of EcoCooling Ltd. In this approach, EcoCooling replaces the CRAC units with a combined air mover, damper assembly, and evaporative cooler. I’ve been interested by evaporative coolers and their application to data center cooling for years and they are becoming more common in modern data center deployments (e.g. Data Center Efficiency Summit).   An evaporative cooler is a simple device that cools air through taking water through a state change from fluid to vapor. They are incredibly cheap to run and particularly efficient in locals with lower humidity. Evaporative coolers can allow the power intensive process-based cooling to be shut off for large parts of the year. And, when combined with favorable climates or increased data center temperatures can entirely replace air conditioning systems. See Chillerlesss Datacenter at 95F, for a deeper discussion see Costs of ... (Read Full Article)

      Comment Mentions:   James Hamilton

    16. Inter-Datacenter Replication & Geo-Redundancy

      Explore Perspectives (May 10 2010)

      Inter-Datacenter Replication & Geo-Redundancy Wide area network costs and bandwidth shortage are the single most common reason why many enterprise applications run in a single data center. Single data center failure modes are common. There are many external threats to single data center deployments including utility power loss, tornado strikes, facility fire, network connectivity loss,  earthquake, break in, and many others I’ve not yet been “lucky” enough to have seen. And, inside a single facility, there are simply too many ways to shoot one’s own foot.  All it takes is one well intentioned networking engineer to black hole the entire facilities networking traffic. Even very high quality power distribution systems can have redundant paths taken out by fires in central switch gear or cascading failure modes.  And, even with very highly redundant systems, if the redundant paths aren’t tested often, they won’t work.  Even with incredibly redundancy, just having the ... (Read Full Article)

      Comment Mentions:   Amazon.com   James Hamilton

    17. Is Sandia National Lab's Red Sky Really Able to Deliver a PUE of 1.035?

      Explore Perspectives (Nov 22 2009)

      Is Sandia National Lab's Red Sky Really Able to Deliver a PUE of 1.035? Sometime back I whined that Power Usage Efficiency (PUE) is a seriously abused term: PUE and Total Power Usage Efficiency.  But I continue to use it because it gives us a rough way to compare the efficiency of different data centers.  It’s a simple metric that takes the total power delivered to a facility (total power) and divides it by the amount of power delivered to the servers (critical power or IT load).  A PUE of 1.35 is very good today. Some datacenter owners have claimed to be as good as 1.2.  Conventionally designed data centers operated conservatively are in the 1.6 to 1.7 range.  Unfortunately most of the industry has a PUE of over 2.0, some are as bad as 3.0, and the EPA reports the industry average is 2.0 (Report to Congress on Server Data Center Efficiency). A PUE of ... (Read Full Article)

      Comment Mentions:   Intel   Norway   Lawrence Livermore National Laboratory

    18. Stanford Clean Slate CTO Summit

      Explore Perspectives (Oct 24 2009)

      Stanford Clean Slate CTO Summit I attended the Stanford Clean Slate CTO Summit last week. It was a great event organized by Guru Parulkar. Here’s the agenda:   12:00: State of Clean Slate -- Nick McKeown, Stanford 12:30:00pm: Software defined data center networking -- Martin Casado, Nicira 1:00: Role of OpenFlow in data center networking -- Stephen Stuart, Google 2:30: Data center networks are in my way -- James Hamilton, Amazon 3:00: Virtualization and Data Center Networking -- Simon Crosby, Citrix 3:30:RAMCloud: Scalable Datacenter Storage Entirely in DRAM  -- John Ousterhout, Stanford 4:00: L2.5:  Scalable and reliable packet delivery in data centers -- Balaji Prabhakar, Stanford 4:45: Panel: Challenges of Future Data Center Networking--Panelists, James Hamilton, Stephen Stuart, Andrew Lambeth (VMWare), Marc Kwiatkowski (Facebook)   I presented Networks are in my Way. My basic premise is that networks are both expensive and poor power/performers. But, much more important, they are in ... (Read Full Article)

      Comment Mentions:   Amazon.com   James Hamilton   Facebook

    19. Successfully Challenging the Server Tax

      Explore Perspectives (Sep 3 2009)

      Successfully Challenging the Server Tax The server tax is what I call the mark-up applied to servers, enterprise storage, and high scale networking gear.  Client equipment is sold in much higher volumes with more competition and, as a consequence, is priced far more competitively. Server gear, even when using many of the same components as client systems, comes at a significantly higher price. Volumes are lower, competition is less, and there are often many lock-in features that help maintain the server tax.  For example, server memory subsystems support Error Correcting Code (ECC) whereas most client systems do not. Ironically both are subject to many of the same memory faults and the cost of data corruption in a client before the data is sent to a server isn’t obviously less than the cost of that same data element being corrupted on the server. Nonetheless, server components typically have ECC while commodity client systems usually do ... (Read Full Article)

      Comment Mentions:   Amazon.com   James Hamilton

    20. Pictures from the Fisher Plaza Data Center Fire

      Explore Perspectives (Jul 10 2009)

      There have been many reports of the Fisher Plaza data center fire. An early one was the Data Center Knowledge article: Major Outage at Seattle Data Center. Data center fires aren’t as rare as any of us would like but this one is a bit unusual in that fires normally happen in the electrical equipment or switchgear whereas this one appears to have been a bus duct fire. The bus duct fire triggered the sprinkler system. Several sprinkler heads were triggered and considerable water was sprayed making it more difficult to get the facility back online quickly.   Several good pictures showing the fire damage were recently published in Tech Flash Photos: Inside the Fisher Fire.                                                                   --jrh   James Hamilton, Amazon Web Services 1200, 12th Ave. S., Seattle, WA, 98144 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | james@amazon.com   H:mvdirona.com | W:mvdirona.com ... (Read Full Article)

      Comment Mentions:   Amazon.com   Data Center Knowledge   James Hamilton

    21. ISCA 2009 Keynote II: Internet-Scale Service Infrastructure Efficiency

      Explore Perspectives (Jun 24 2009)

      I presented the keynote at the International Symposium on Computer Architecture 2009 yesterday.  Kathy Yelick kicked off the conference with the other keynote on Monday: How to Waste a Parallel Computer.   Thanks to ISCA Program Chair Luiz Borroso for the invitation and for organizing an amazingly successful conference.  I’m just sorry I had to leave a day early to attend a customer event this morning. My slides: Internet-Scale Service Infrastructure Efficiency.   Abstract: High-scale cloud services provide economies of scale of five to ten over small-scale deployments, and are becoming a large part of both enterprise information processing and consumer services. Even very large enterprise IT deployments have quite different cost drivers and optimizations points from internet-scale services. The former are people-dominated from a cost perspective whereas internet-scale service costs are driven by server hardware and infrastructure with people costs fading into the noise at less than 10%.   In this ... (Read Full Article)

      Comment Mentions:   Amazon.com   James Hamilton

    22. PUE and Total Power Usage Efficiency (tPUE)

      Explore Perspectives (Jun 14 2009)

      PUE and Total Power Usage Efficiency (tPUE) I like Power Usage Effectiveness as a course measure of infrastructure efficiency. Its gives us a way of speaking about the efficiency of the data center power distribution  and mechanical equipment without having to qualify the discussion on the basis of server and storage used or utilization levels, or other issues not directly related to data center design. But, there are clear problems with the PUE metric. Any single metric that attempts reduce a complex system to a single number is going to both fail to model important details and it is going to be easy to game. PUE suffers from some of both nonetheless, I find it useful. In what follows, I give an overview of PUE, talk about some the issues I have with it as currently defined, and then propose some improvements in PUE measurement using a metric called tPUE.   What is PUE? PUE is defined in ... (Read Full Article)

      Comment Mentions:   Amazon.com   The Green Grid   James Hamilton