-
-
Categories
-
Data Center Design:
Construction,
Container,
Data Center Outages,
Monitoring,
Power and Cooling
Policy: Cap and Trade, Carbon Footprint, Carbon Reduction Commitment, Carbon Tax, Emissions
Power: Biomass, Fossil Fuel, Fuel Cell, Geothermal, Hydro, Nuclear, Solar, Wind
Application: Cloud Computing, Grid Computing
Technology: Microblogging, Networking, Servers, Storage, Supercomputer
-
Spot Instances, Big Clusters, & the Cloud at Work
Explore Perspectives (Sep 20 2011) Cloud Computing
If you read this blog in the past, you’ll
know I view cloud computing as a game changer (Private
Clouds are not the Future)
and spot instances as a particularly powerful innovation within cloud computing. Over
the years, I’ve enumerated many of the advantages of cloud computing over private
infrastructure deployments. A particularly powerful cloud computing advantage is driven
by noting that when combining a large number of non-correlated workloads, the overall
infrastructure utilization is far higher for most workload combinations. This
is partly because the reserve capacity to ensure that all workloads are able to support
peak workload demands is a tiny fraction of what is required to provide reserve surge
capacity for each job individually.
This factor alone is a huge gain
but an even bigger gain can be found by noting that all workloads are cyclic and go
through sinusoidal capacity peaks and troughs. Some ...
(Read Full Article)
Comment Mentions: Amazon.com James Hamilton
2011 European Data Center Summit
Explore Perspectives (May 25 2011)
The European
Data Center Summit 2011 was held
yesterday at SihlCity CinCenter in Zurich. Google Senior VP Urs
Hoelzle kicked off
the event talking about why data center efficiency was important both economically
and socially. He went on to point out
that the oft quoted number that US data centers represent is 2% of total energy consumption
is usually mis-understood. The actual data point is that 2% of the US energy budget
is spent on IT of which the vast majority is client side systems. This is unsurprising
but a super important clarification. The
full breakdown of this data:
· 2%
of US power
o Datacenters: 14%
o Telecom: 37%
o Client
Device: 50%
The net is that 14% of 2% or 0.28%
of the US power budget is consumed in datacenters. This
is a far smaller but still a very relevant number. In fact, that is the primary motivator ...
(Read Full Article)
Comment Mentions: James Hamilton
European Data Center Efficiency Summit
Explore Perspectives (Apr 29 2011) Power and Cooling
Google cordially invites you to participate in a European Summit on sustainable Data Centres. This event will focus on energy-efficiency best practices that can be applied to multi-MW custom-designed facilities, office closets, and everything in between. Google and other industry leaders will present case studies that highlight easy, cost-effective practices to enhance the energy performance of Data Centres. The summit will also include a dedicated session on cooling. Presenters will detail climate-specific implementations of free cooling as well as novel ways to utilise locally -available opportunities. We will also debate climate-independent PUE targets. The agenda includes presentations and panel discussions featuring Amazon, DeepGreen, eBay, Google, IBM, Microsoft, Norman Disney & Young, PlusServer, Telecity Group, The Green Grid, UK's Chartered Institute for IT, UBS and others. Attendance is free. However, space is limited and we therefore encourage you to register online at your earliest convenience. Your participation will be confirmed. We ... (Read Full Article)
Comment Mentions: Amazon.com Google The Green Grid
More Data on Datacenter Air Side Economization
Explore Perspectives (Mar 15 2011) Servers
Two of the highest leverage datacenter efficiency
improving techniques currently sweeping the industry are: 1) operating at higher ambient
temperatures and air-side economization with evaporative cooling.
The American Society of Heating and Refrigeration, and Air-Conditioning Engineers (ASHRAE)
currently recommends that servers not be operated at inlet temperatures beyond 81F.
(Read Full Article)
Comment Mentions: ASHRAE James Hamilton
Yahoo! Compute Coop Design
Explore Perspectives (Mar 5 2011) Wind , Servers
Christina Page, Director of Climate & Energy Strategy at Yahoo! spoke at the 2010 Data Center Efficiency Summit where she presented Yahoo! Compute Coop Design.
The primary attributes of the Yahoo! design are: 1) 100% free air cooling (no chillers), 2) slab concrete floor, 3) use of wind power to augment air handling units, and 4) pre-engineered building for construction speed.
Christina reports the idea to orient the building such that the wind force on the external wall facing the dominant wind direction and use this higher pressure to assist the air handling units was taken from looking at farm buildings in the Buffalo, New York area. An example given was the use of natural cooling in chicken coops.
(Read Full Article)
Comment Mentions: Yahoo James Hamilton Microsoft Corp
Exploring the Limits of Datacenter Temperature
Explore Perspectives (Feb 27 2011) Servers
Datacenter temperature has been ramping up
rapidly over the last 5 years. In fact, leading operators have been pushing temperatures
up so quickly that the American
Society of Heating, Refrigeration, and Air-Conditioning recommendations
have become a become trailing indicator of what is being done rather than current
guidance. ASHRAE responded in January of 2009 by raising the recommended limit from
77F to 80.6F (HVAC
Group Says Datacenters Can be Warmer).
This was a good move but many of us felt it was late and not nearly a big enough increment. Earlier
this month, ASHRAE announced they are again planning to take action and raise the
recommended limit further but haven’t yet announced by how much (ASHRAE:
Data Centers Can be Even Warmer).
Many datacenters are operating reliably well
in excess even the newest ASHRAE recommended temp of 81F. For example, back in 2009
Microsoft announced they were operating ...
(Read Full Article)
Comment Mentions: Sun Microsystems ASHRAE James Hamilton
Speeding Up Cloud/Server Applications With Flash Memory
Explore Perspectives (Feb 6 2011)
Last week, Sudipta
Sengupta of Microsoft Research
dropped by the Amazon Lake Union campus to give a talk on the flash memory work that
he and the team at Microsoft Research have been doing over the past year. Its
super interesting work. You may recall Sudipta as one of the co-authors on the VL2
Paper (VL2:
A Scalable and Flexible Data Center Network)
I mentioned last October.
Sudipta’s slides for the flash memory talk
are posted at Speeding
Up Cloud/Server Applications With Flash Memory and
my rough notes follow:
· Technology
has been used in client devices for more than a decade
· Server
side usage more recent and the difference between hard disk drive and flash characterizes
brings some challenges that need to be managed in the on-device Flash
Translation Layer (FTL) or
in the operating systems or Application layers.
· Server
requirements are more aggressive across several dimensions including ...
(Read Full Article)
Comment Mentions: Amazon.com Microsoft Research James Hamilton
Datacenter Networks are in my Way
Explore Perspectives (Oct 31 2010) Networking
I did a talk earlier this week on the sea
change currently taking place in datacenter networks. In Datacenter
Networks are in my Way I
start with an overview of where the costs are in a high scale datacenter. With that
backdrop, we note that networks are fairly low power consumers relative to the total
facility consumption and not even close to the dominant cost. Are they actually a
problem? The rest of the talk is arguing networks are actually a huge problem across
the board including cost and power. Overall, networking gear lags behind the rest
of the high-scale infrastructure world, block many key innovations, and actually are
both cost and power problems when we look deeper.
The overall talk agenda:
· Datacenter
Economics
· Is
Net Gear Really the Problem?
· Workload
Placement Restrictions
· Hierarchical
& Over-Subscribed
· Net
Gear: SUV of the Data Center
· Mainframe
Business Model
· Manually
Configured & Fragile at Scale ...
(Read Full Article)
Comment Mentions: Cisco James Hamilton
Netflix Migration to the Cloud
Explore Perspectives (Oct 10 2010) Cloud Computing
This morning I came across an article written by Sid
Anand, an architect at Netflix that is super interesting. I liked it for two reasons: 1) it talks about the move of substantial portions of a high-scale web site to the cloud, some of how it was done, and why it was done, and 2) its gives best practices on AWS SimpleDB usage.
I love articles about how high
scale systems work. Some past postings:
FriendFeed
use of MySQL
Facebook
Cassandra Architecture and Design
Wikipedia
Architecture
MySpace
Architecture and .Net
Flickr
DB Architecture
Geo-Replication
at Facebook
Scaling
at LucasFilms
Facebook:
Needle in a Haystack: Efficient Storage of Billions of Photos
Scaling
LinkedIn
Scaling
at MySpace
The article starts off by explaining
why Netflix decided to move their infrastructure to the cloud:
Circa late 2008, Netflix had a
single data center. This single data center raised a few concerns. As a ...
(Read Full Article)
Comment Mentions: Amazon.com Oracle James Hamilton
Scaling AWS Relational Database Service
Explore Perspectives (Oct 9 2010)
Hosting multiple MySQL engines with MySQL Replication between them is a common design pattern for scaling read-heavy MySQL workloads. As with all scaling techniques, there are workloads for which it works very well but there are also potential issues that need to be understood. In this case, all write traffic is directed to the primary server and, consequently is not scaled which is why this technique works best for workloads heavily skewed towards reads. But, for those fairly common read heavy workloads, the techniques works very well and allows scaling the read workload across over a fleet of MySQL instances. Of course, as with any asynchronous replication scheme, the read replicas are not transactionally updated. So any application running on MySQL read replica’s must be tolerant of eventually consistent updates. Load balancing high read traffic over multiple MySQL instances works very well but this is only one of the ... (Read Full Article)
Comment Mentions: Amazon.com James Hamilton Facebook
Overall Data Center Costs
Explore Perspectives (Sep 18 2010) Cloud Computing , Servers
A
couple of years ago, I did a detailed look at where the costs are in a modern , high-scale
data center. The primary motivation behind bringing all the costs together was to
understand where the problems are and find those easiest to address. Predictably,
when I first brought these numbers together, a few data points just leapt off the
page: 1) at scale, servers dominate overall costs, and 2) mechanical system cost and
power consumption seems unreasonably high. Both of these areas have proven to be important
technology areas to focus upon and there has been considerable industry-wide innovation
particularly in cooling efficiency over the last couple of years.
I posted the original model at the
Cost of Power in Large-Scale Data Centers.
One of the reasons I posted it was to debunk the often repeated phrase “power is the
dominate cost in a large-scale data center”. Servers dominate with ...
(Read Full Article)
Comment Mentions: Google James Hamilton
Energy Proportional Datacenter Networks
Explore Perspectives (Aug 1 2010) Cloud Computing , Networking , Servers
A couple of weeks back Greg
Linden sent me an interesting paper
called Energy
Proportional Datacenter Networks.
The principal of energy proportionality was first coined by Luiz
Barroso and Urs
Hölzle in an excellent paper titled The
Case for Energy-Proportional Computing.
The core principal behind energy proportionality is that computing equipment should
consume power in proportion to their utilization level. For example, a computing component
that consumes N watts at full load, should consume X/100*N Watts when running at X%
load. This may seem like a obviously important concept but, when the idea was first
proposed back in 2007, it was not uncommon for a server running at 0% load to be consuming
80% of full load power. Even today, you can occasionally find servers that poor. The
incredibly difficulty of maintaining near 100% server utilization makes energy proportionality
a particularly important concept.
One of the wonderful aspects ...
(Read Full Article)
Comment Mentions: Cisco James Hamilton
The New World Order
Explore Perspectives (Jun 7 2010) Cloud Computing
Industry
trends come and go. The ones that stay with us and have lasting impact are those that
fundamentally change the cost equation. Public clouds clearly pass this test. The
potential savings approach 10x and, in cost sensitive industry, those that move to
the cloud fastest will have a substantial cost advantage over those that don’t.
And, as much as I like saving
money, the much more important game changer is speed of execution. Those companies
depending upon public clouds will noticeably more nimble. Project approval to delivery
times fall dramatically when there is no capital expense to be approved. When the
financial risk of new projects is small, riskier projects can be tried. The pace of
innovation increases. Companies where innovation is tied the financial approval cycle
and the hardware ordering to install lag are at a fundamental disadvantage.
Clouds change companies for the
better, clouds drive down ...
(Read Full Article)
Comment Mentions: James Hamilton
PUE is Still Broken and I still use it
Explore Perspectives (May 25 2010) Servers
PUE is still broken and I still use it. For
more on why TPUE has definite flaws, see: PUE
and Total Power Usage Efficiency.
However, I still use it because it’s an easy to compute summary of data center efficiency.
It can be gamed endlessly but it’s easy to compute and it does provide some value.
Improvements are underway in locking down
of the most egregious abuses of PUE. Three were recently summarized in Technical
Scribblings RE Harmonizing Global Metrics
for Data Center Energy Efficiency. In
this report from John
Stanley, the following
were presented:
· Total
energy to include all forms of energy whether electric or otherwise (e.g. gas fired
chiller must include chemical energy being employed). I like it but It’ll be a challenge
to implement
· Total
energy should include lighting, cooling, and all support infrastructure. We already
knew this but its worth clairifying since ...
(Read Full Article)
Comment Mentions: Data Center Efficiency James Hamilton
Computer Room Evaporative Cooling
Explore Perspectives (May 14 2010) Servers
I recently came across a nice data center
cooling design by Alan Beresford of EcoCooling
Ltd. In this approach, EcoCooling
replaces the CRAC units with a combined air mover, damper assembly, and evaporative
cooler. I’ve been interested by
evaporative coolers and their application to data center cooling for years and they
are becoming more common in modern data center deployments (e.g. Data
Center Efficiency Summit).
An evaporative cooler is a simple device that
cools air through taking water through a state change from fluid to vapor. They are
incredibly cheap to run and particularly efficient in locals with lower humidity.
Evaporative coolers can allow the power intensive process-based cooling to be shut
off for large parts of the year. And, when combined with favorable climates or increased
data center temperatures can entirely replace air conditioning systems. See Chillerlesss
Datacenter at 95F, for a deeper
discussion see Costs
of ...
(Read Full Article)
Comment Mentions: James Hamilton
Inter-Datacenter Replication & Geo-Redundancy
Explore Perspectives (May 10 2010) Cloud Computing
Wide area network costs and bandwidth
shortage are the single most common reason why many enterprise applications run in
a single data center. Single data center failure modes are common. There are many
external threats to single data center deployments including utility power loss, tornado
strikes, facility fire, network connectivity loss, earthquake,
break in, and many others I’ve not yet been “lucky” enough to have seen. And, inside
a single facility, there are simply too many ways to shoot one’s own foot. All
it takes is one well intentioned networking engineer to black hole the entire facilities
networking traffic. Even very high quality power distribution systems can have redundant
paths taken out by fires in central switch gear or cascading failure modes. And,
even with very highly redundant systems, if the redundant paths aren’t tested often,
they won’t work. Even
with incredibly redundancy, just having the ...
(Read Full Article)
Comment Mentions: Amazon.com James Hamilton
Is Sandia National Lab's Red Sky Really Able to Deliver a PUE of 1.035?
Explore Perspectives (Nov 22 2009) Supercomputer
Sometime back I whined that Power
Usage Efficiency (PUE) is
a seriously abused term: PUE
and Total Power Usage Efficiency. But
I continue to use it because it gives us a rough way to compare the efficiency of
different data centers. It’s a simple metric that takes
the total power delivered to a facility (total power) and divides it by the amount
of power delivered to the servers (critical power or IT load). A
PUE of 1.35 is very good today. Some datacenter owners have claimed to be as good
as 1.2. Conventionally designed data centers operated
conservatively are in the 1.6 to 1.7 range. Unfortunately
most of the industry has a PUE of over 2.0, some are as bad as 3.0, and the EPA reports
the industry average is 2.0 (Report
to Congress on Server Data Center Efficiency).
A PUE of ...
(Read Full Article)
Comment Mentions: Intel Norway Lawrence Livermore National Laboratory
Stanford Clean Slate CTO Summit
Explore Perspectives (Oct 24 2009) Cloud Computing , Networking , Servers
I attended the Stanford Clean
Slate CTO Summit last week. It
was a great event organized by Guru
Parulkar.
Here’s the agenda:
12:00:
State of Clean Slate -- Nick McKeown, Stanford
12:30:00pm:
Software defined data center networking -- Martin Casado, Nicira
1:00:
Role of OpenFlow in data center networking -- Stephen Stuart, Google
2:30:
Data center networks are in my way -- James Hamilton, Amazon
3:00:
Virtualization and Data Center Networking -- Simon Crosby, Citrix
3:30:RAMCloud:
Scalable Datacenter Storage Entirely in DRAM --
John Ousterhout, Stanford
4:00:
L2.5: Scalable and reliable packet delivery
in data centers -- Balaji Prabhakar, Stanford
4:45:
Panel: Challenges of Future Data Center Networking--Panelists, James Hamilton, Stephen
Stuart, Andrew Lambeth (VMWare), Marc Kwiatkowski (Facebook)
I presented Networks
are in my Way. My basic
premise is that networks are both expensive and poor power/performers. But, much more
important, they are in ...
(Read Full Article)
Comment Mentions: Amazon.com James Hamilton Facebook
Successfully Challenging the Server Tax
Explore Perspectives (Sep 3 2009) Servers , Storage
The server tax is what I call the mark-up
applied to servers, enterprise storage, and high scale networking gear. Client
equipment is sold in much higher volumes with more competition and, as a consequence,
is priced far more competitively. Server gear, even when using many of the same components
as client systems, comes at a significantly higher price. Volumes are lower, competition
is less, and there are often many lock-in features that help maintain the server tax. For
example, server memory subsystems support Error
Correcting Code (ECC) whereas
most client systems do not. Ironically both are subject to many of the same memory
faults and the cost of data corruption in a client before the data is sent to a server
isn’t obviously less than the cost of that same data element being corrupted on the
server. Nonetheless, server components typically have ECC while commodity client systems
usually do ...
(Read Full Article)
Comment Mentions: Amazon.com James Hamilton
Pictures from the Fisher Plaza Data Center Fire
Explore Perspectives (Jul 10 2009) Cloud Computing
There have been many reports of the Fisher Plaza data center fire. An early one was the Data Center Knowledge article: Major Outage at Seattle Data Center. Data center fires aren’t as rare as any of us would like but this one is a bit unusual in that fires normally happen in the electrical equipment or switchgear whereas this one appears to have been a bus duct fire. The bus duct fire triggered the sprinkler system. Several sprinkler heads were triggered and considerable water was sprayed making it more difficult to get the facility back online quickly. Several good pictures showing the fire damage were recently published in Tech Flash Photos: Inside the Fisher Fire. --jrh James Hamilton, Amazon Web Services 1200, 12th Ave. S., Seattle, WA, 98144 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | james@amazon.com H:mvdirona.com | W:mvdirona.com ... (Read Full Article)
Comment Mentions: Amazon.com Data Center Knowledge James Hamilton
ISCA 2009 Keynote II: Internet-Scale Service Infrastructure Efficiency
Explore Perspectives (Jun 24 2009) Cloud Computing
I presented the keynote at the International Symposium on Computer Architecture 2009 yesterday. Kathy Yelick kicked off the conference with the other keynote on Monday: How to Waste a Parallel Computer. Thanks to ISCA Program Chair Luiz Borroso for the invitation and for organizing an amazingly successful conference. I’m just sorry I had to leave a day early to attend a customer event this morning. My slides: Internet-Scale Service Infrastructure Efficiency. Abstract: High-scale cloud services provide economies of scale of five to ten over small-scale deployments, and are becoming a large part of both enterprise information processing and consumer services. Even very large enterprise IT deployments have quite different cost drivers and optimizations points from internet-scale services. The former are people-dominated from a cost perspective whereas internet-scale service costs are driven by server hardware and infrastructure with people costs fading into the noise at less than 10%. In this ... (Read Full Article)
Comment Mentions: Amazon.com James Hamilton
PUE and Total Power Usage Efficiency (tPUE)
Explore Perspectives (Jun 14 2009) Servers
I like Power Usage Effectiveness as a course measure of infrastructure efficiency. Its gives us a way of speaking about the efficiency of the data center power distribution and mechanical equipment without having to qualify the discussion on the basis of server and storage used or utilization levels, or other issues not directly related to data
center design. But, there are clear problems with the PUE metric. Any single metric
that attempts reduce a complex system to a single number is going to both fail to
model important details and it is going to be easy to game. PUE suffers from some
of both nonetheless, I find it useful.
In what follows, I give an overview
of PUE, talk about some the issues I have with it as currently defined, and then propose
some improvements in PUE measurement using a metric called tPUE.
What
is PUE?
PUE is defined in ...
(Read Full Article)
Comment Mentions: Amazon.com The Green Grid James Hamilton







Recent Comments
ControlCircle » Gartner: Build your own datacentre rather than hosting
It’s startling that in today’s volatile environment Gartner is prescribing such a high risk strategy. ...
Carbon3IT Ltd » Does efficiency matter when your power is renewable (and affordable)? - By Peter Judge
Peter, do you really think that this is good practice?, as you say its like ...
See all recent comments