Bill Worley, Chief Technology Officer, Secure64 Software Corporation
December 9th, 2008 By Stephen Northcutt
At larger conferences, the SANS Institute has a
vendor show, and I like to attend to find out about new companies and
new technology. There was a vendor at our last show in Las Vegas,
Secure 64. I had never heard of them, so I wandered over and we had a
great chat. They are a DNSSEC vendor who sells a product based on the
HP Itanium architecture. The more they talked, the more I learned about
an incredible guy, a security thought leader named Bill Worley, so
please let me introduce you to Bill.
Bill started programming in 1959; he started as a math and physics
student and ended up in computers. Today, Dr. William (Bill) Worley Jr.
is the CTO of Secure64 Software Corporation. He is a retired HP Fellow
(Chief Scientist and Distinguished Contributor), and served as a
Commissioner of Colorado Governor's Science and Technology Commission.
He received an MS (Physics) and MS (Information Science) from the
University of Chicago, and a PhD (Computer Science) from Cornell
University. However, Bill is, at heart, a system architect.
That certainly counts for street credibility, Bill; what
is your background? What led you to focus on the Itanium architecture?
As I said, computer architecture is a my primary interest. In the mid
1960’s, while a graduate student and consultant to IBM
Research and SRA, I was a principal, together with two Stanford
graduate students, in architecting a novel 16-bit system. The system
was actually built by IBM, and we tried to convince IBM to market small
business applications on this system. But IBM’s market
research then concluded that there was no market for such a "personal
computer." I later worked for IBM for 13 years on mainframe,
storage, and other advanced architectures. A particular privilege was
working in the mid 1970’s with Dr. John Cocke and the IBM
Yorktown Heights Research team on the 801 architecture, the first of
the RISC architectures.
OK, and according to Wikipedia,
"the acronym RISC (pronounced risk), for reduced instruction set
computing, represents a CPU design strategy emphasizing the insight
that simplified instructions which 'do less' may still provide for
higher performance if this simplicity can be utilized to make
instructions execute very quickly." So, you got as far at IBM as you
could with architecture and decided to move on; what happened next?
Just an added point. RISC architectures derive their advantage not only
from enabling higher operating clock frequencies, but also from the
fact that the most frequently executed machine instructions, even in
more complex architectures, are the simplest ones. RISC architectures,
therefore, made it possible to build a smaller, simpler processor
entirely in hardware, to integrate the whole processor on a single
silicon chip, and to deliver superior price and performance.
I left IBM and joined HP Labs at the end of 1981. My first job was to
lead the development of HP’s RISC architecture, later
christened PA-RISC. This architecture offered advances in RISC
architecture, described in papers published at the time, and became an
architecture of convergence for HP. We also discovered how to move
applications written for the earlier HP-3000 architecture, a very
successful 16-bit stack oriented architecture, to PA-RISC by
translating the HP-3000 binary programs. DEC later employed this
technique successfully to migrate applications from their VAX
architecture to their Alpha RISC architecture. The earliest PA-RISC
systems shipped in the mid 1980’s, and the PA-RISC product
line continued to evolve.
By 1990, the question had become how could HP leap frog competitive
RISC systems. HP adopted two avenues of approach. The first was to
develop advanced CMOS processors that made major advances in
parallelism and clock frequencies. An effort led by Denny Georg, now
Secure64’s Chairman, developed the HP processor family,
internally known as the "Snakes" processors. These processors set new
records for performance and price/performance. The second approach was
to undertake a deep study in HP Labs to understand how RISC processors
would continue to evolve, what fundamental barriers they would
encounter, and to explore architecture innovations that could surpass
RISC architectures in the long run.
At this point in time RISC was clearly encountering
limitations. RISC architectures are inherently sequential,
and straightforward implementations of these architectures at best
sustain an instruction execution rate of one instruction per cycle. It
became axiomatic that further significant advances in performance
require a processor to be able to execute multiple instructions per
cycle.
In the 1960’s an IBM program known as the Advance Computing
System (ACS) effort had delved deeply into these
questions. This was the first effort to prove that it is
possible to design a processor that can issue multiple instructions per
cycle, in an order of execution that differed from the sequential,
in-order execution defined by the instruction set architecture. These
results had been published by 1971 in a paper by Dr. Herb Schorr, who
had been the architecture director of the ACS effort.
Early in the 1990’s IBM introduced the RS-6000 RISC
processor, which was a RISC processor capable of executing more than a
single instruction per cycle. Similar multiple issue capabilities were
included in the HP Snakes processors, and the conventional wisdom in
the early 1990’s was that the future for RISC
architectures would be implementations capable of executing multiple
instructions per cycle, not in the prescribed order of the sequential
RISC architecture definition. Such implementations became called
"Out-of-Order-Super-Scalar" implementations.
Our research in HP Labs focused both upon the mathematics and the
practicalities of out-of-order multiple issue processors. We also
focused on alternative architectural approaches that could achieve
levels of instruction parallelism higher than those attainable by
out-of-order super-scalar RISC processors.
We reached several basic conclusions about out-of-order superscalar
processors. First, such implementations are extremely complex, with
very long execution pipelines; and the resulting concurrent execution
levels were less than hoped for. Second, there are mathematical
barriers that prevent such processors from ever reaching high levels of
instruction parallelism. Thousands of comparators must execute and have
their results correlated every instruction cycle for even modest levels
of multiple instruction issue, and this hardware complexity grows
quadratically. Third, the silicon area required to re-order
and issue multiple instructions is huge. For the PA-8000,
HP’s first 64-bit out-of-order superscalar processor, for
example, the number of gates required for the re-order and issue buffer
was as large as that required for the entire previous generation
processor chip. And lastly, after having built an out-of-order
processor, it is still necessary for a compiler to schedule the
instructions meticulously to realize even modest levels of instruction
execution parallelism.
Bottom line, even with a highly complex out-of-order-super-scalar
processor, and significant silicon chip area devoted to out-of-order
multiple instruction issue, the compiler still must work hard
to schedule instructions to achieve sharply bounded levels of
parallelism.
The alternative architecture approach we adopted in HP Labs was first
to eliminate the bottleneck of sequential instruction encoding, and
then to focus upon the problems of providing a richer set of computing
resources, minimizing the effects of memory stalls and pipeline delays,
reducing hardware complexity, and strengthening security capabilities.
For traditional RISC architectures, one has a compiler that
can see all the opportunities for execution parallelism. However, the
compiler is constrained to map the computation into a sequentially
executed stream of instructions. The hardware, then, must dynamically
re-examine a relatively small excerpt of the sequential stream of
instructions to rediscover opportunities to execute instructions in
parallel or in a better order. Clearly the sequential instruction
schedule is a bottleneck. A more promising approach is to keep the
hardware as simple as possible, use all of the silicon area for caches,
buffers, execution units, or other hardware components that contribute
directly to computational throughput, and permit the compiler to
produce a parallel instruction schedule. In other words, encode
concurrent instructions simply and make the instruction level
parallelism explicit. This approach became known as EPIC, explicitly
parallel instruction set computing. The initial architecture defined by
the HP Labs team was called PA-WW (PA Wide Word).
Even with a much richer set of registers and functional units the
silicon area for such a system will be smaller than that required for
an out-of-order RISC implementation in the same silicon technology. In
fact, the active computing core of the first Itanium2 chip was about
half the size of the active computing core of the comparable Intel X86
chip.
The difficult problems for an EPIC processor implementation were
minimizing the effects of memory latencies and instruction execution
pipeline disruptions. These problems were addressed by several
innovations described in the published papers about the architecture.
Basically, greater on-chip areas are available for caches and more
execution units, more explicit control is provided for caching and
initiating memory operations speculatively and earlier in the execution
stream, execution pipelines are far shorter, and predication enables
many conditional operations to be performed without using branch
instructions - avoiding execution pipeline delays.
The architecture also was enriched by automatic means for managing the
large register sets, compactly scheduling software pipelined
computations by explicit register renaming, dynamically controlling
fine-grained access to sets of pages in memory, and storing critical
control information in areas of memory inaccessible to executing
software. These latter capabilities significantly enhanced the
architecture’s capability to support a highly secure system.
By late 1993, HP had decided to proceed with EPIC architecture and
approached Intel to see if they would like to partner. It didn't make
economic sense for HP to try to get into the silicon business or to
launch a new proprietary architecture. Technical and business study
groups from both corporations came together and soon saw the advantages
to a partnership. That is how the Itanium effort began, and how Intel
and HP began working together.
There were growing pains during the initial system roll outs, but
today, Intel Itanium processors are the foundation for servers for the
huge Enterprise mission-critical server market, According to a mid 2008
report by IDC "Itanium is the fastest growing server platform in the
world." On October 29, 2008, HP announced major successes in migrating
European customers from legacy mainframe systems to HP Integrity
(Itanium based) servers, achieving data center cost savings of "up to
70 percent," and that many similar migrations are expected in the "next
12 months.". According to IDC, HP "tied for the top position in revenue
for the Europe, Middle East, and Africa Enterprise server market,
securing 32.7 percent market share as measured in the second quarter of
2008."[1] The article also cited "Rapid adoption of Intel®
Itanium® processor-based platforms" and "HP Integrity
Innovations continue to grow."
So Bill, as I understand it, when you retired, you headed
into your workshop to create a new operating system, what was your goal?
One of the HP Labs goals was to build into the EPIC architecture
capabilities that would make the system more secure. For example, one
can store critical control information in a memory area that is
protected by "Protection Key". A protection key permits one to
tag a page of memory with a 24 bit identifier and, unless a processor
register contains a matching identifier, executing code cannot access
that memory - no matter what its privilege level.
Even while at HP I advocated work to create operating systems that have
a secure architecture by fully utilizing unique Itanium hardware
capabilities. After retiring I met some folks here in Denver who also
were interested in developing systems that had unprecedented security
and performance. This is what we’ve done at Secure64. Our
system objectives were to make the system and its applications immune
to compromise from malware and rootkits, and resistant to network
attacks. We call such a system "Genuinely Secure."
A genuinely secure system is quite unlike a hardened OS that is
configured to minimize its attack surface and its exposures to
vulnerabilities. To be honest with you, I believe that we never are
going to be able to achieve required security levels in complex general
purpose operating systems such as Microsoft Windows, UNIX, and Linux.
The fundamental problem with these operating systems is they are
monstrously complex, having added layers and layers of abstraction to a
fundamentally weak foundation running on a weak hardware protection
structure - these systems don’t even use all of the
protection capabilities offered in the underlying hardware
architectures. Portability and backwards compatibility have been major
constraints, and eventually the inertia of the total investment
militates against evolving to a more solid foundation. Eventually,
after one creates so many layers of abstraction, no one understands
anymore what actually is going on.
What we have today is a continuing cycle of vulnerability discovery,
exploitation, and patching, both in operating systems and in
applications. Most systems cannot survive when directly connected to
the Internet. They must be surrounded by a bodyguard of protective
devices such as firewalls, intrusion detection systems, flood
protection systems, etc. each themselves built on similar foundations
and with their own vulnerabilities. This is a rich target of attack for
vandals, criminals, and hostile nation states. I believe the infusion
of virtualization into this situation will be found actually to
increase, rather than reduce system attack surfaces. Chris Whitener of
HP, for example, has pointed out that snapshots of virtual machines,
taken to suspend operation or migrate the virtual machine to another
system, offer a rich harvest of sensitive material stored in the clear,
and ripe for attack.
So you focus on the security capabilities of the Itanium
architecture, what are some of the things you do?
For a long time I have believed it is possible to design build a system
with extremely strong security properties, particularly when one can
design such a system to run on a hardware architecture that supports
advanced capabilities for security. We often have been asked why we run
on Itanium. The simple answer is that Itanium has unique properties
that are essential for a genuinely secure system. We have white papers
on our website
explaining these matters.
At Secure64, we took a step back, and built the entire system, from the
ground up, for security, simplicity, and minimal complexity. We were
guided by solid design principles, and believe the level of security we
have achieved is unmatched. One basic principle is that every piece of
code executing in our system be backed by a chain of trust. We know
what code is running, that it has not been contaminated, and that it
has been authenticated by cryptographic signature verification each
time the system boots. Code being executed is set "execute
only", which is a hardware protection setting provided in
Itanium, so executable code can neither be read nor written by any
other executing software. Further, no other executable code can be
injected while the system is running. This clearly is not fully
general, but it is sufficient for our product applications. From this
minimal complexity, we gain the assurance that no malware or rootkits
can be injected into the system.
To be sure, in every architecture there is always one highest privilege
level where code can execute any of the machine instructions. Our
guiding principle here is to minimize such code. Code at this
level is limited to basic mechanisms that absolutely require privileged
instruction to perform their task. We use all four privilege levels
supported by the Itanium architecture, and keep the code executing at
the highest privilege level to a minimum. Rather than having
tens of thousands, or millions of lines of source code that generate
instructions that execute at the highest privilege level, we have only
a few thousand source lines. It is comprehensible by a single person,
and we intend at some point to publish this source code for expert and
peer review.
OK, I understand, secure and simple, so that is why you
called it a "Micro Operating System"?
We call our SourceT® a micro operating system, since it does
control the system and is so much smaller than anything else out there.
We tried to explain it to people without using the term "operating
system," but they were not understanding it. After presenting our
system without using the term "operating system," we often were asked
if it "ran on Windows or Linux?" When we began using the "micro
operating system" term, folks got the idea, so it enabled more
effective communication. A major goal of design was to completely
eliminate malware. The system also is designed to protect itself from
all the types of attacks that currently are launched via the Internet -
without using firewalls, IDS’s, IPS’s, or the like.
The idea was to change the game so that none of the current remote
attacks work anymore.
I did my homework, and found the security of your micro OS was
evaluated by Matasano, here is their executive
summary, would you like to make any additional comments?
Matasano was tasked by Secure64 and Intel to evaluate critically the
claim that, for remote attackers, SourceT is "immune to all forms of
malware, including rootkits, Trojans, viruses and worms." To do this,
Matasano evaluated the SourceT architecture against three areas of
vulnerability: code injection, privilege level escalation, and
alteration or subversion of the trusted boot process. These areas were
selected as they comprise the strategy of typical malware such as
worms, spyware or trojan horse applications, to introduce arbitrary
code into a computing system. Their report is on our website.
According to an article I read on the Register, "Despite
HP's healthy rise in Itanium server sales, IBM think its major rival
and Intel will have to give up on the processor in the near future due
to basic economics. "The end of life for Itanium will occur in the next
five years," IBM VP Scott Handy told us, during an interview here in
Austin, Texas. "(HP) will have to announce some kind of transition."
[Itanium] Will any of these security
innovations matter if Itanium is not a success in the marketplace? I am
told that Beta was superior to VHS and that just did not appear to
matter.
Presently no other hardware architecture can offer the security
capabilities of Itanium. Many of the techniques we employ could not
function on current mainframe, CISC, or RISC architectures. We believe
that some of the present cyber security problems can only be met by
employing security capabilities of this type, and have made such
suggestions to various working groups on National cyber security.
I believe, though, that the IBM prognostication is wishful thinking on
their part. I know the Register and other trade press have been
predicting the demise of Itanium since the late
1990’s. But Itanium systems are stronger than ever, have set
numerous world records for performance and price/performance, and today
continue to increase their penetration of the Enterprise
mission-critical server market. It is hard to believe that Intel, HP,
Microsoft, and other manufacturers of Itanium-based systems will decide
to abandon this lucrative market. It is also of note that HP supports a
very broad range of hardware and operating systems on Itanium - not
only HP-UX, Windows, and Linux, but also VMS, and the 99.99999 %
available Non-Stop system, together with virtualization offerings.
A further point is that because Itanium is so highly parallel it is
unsurpassed as a micro-emulator of earlier mainframe and RISC
architectures. Intel at one of its IDF forums demonstrated execution of
SPARC binaries on Itanium at a performance exceeding that of then
extant SPARC processors. X86 and PA-RISC binaries also execute at
competitive speeds in this manner. Recently IBM bought a startup named
"Platform Solutions" that had shown that IBM mainframe system and
application software ran at dangerously competitive performance levels
on Itanium. Perhaps your IBM spokesperson can explain why IBM chose not
to compete with them in the open marketplace.
Thank you for that Bill. Now that we have a secure operating system as
a foundation, you need an application to put on it, why did you find
DNSSEC interesting?
There is broad agreement that a secure DNS system is essential to the
Nation’s, and to the world’s, cyber infrastructure.
This fall, the Black Hat conference paper by Dan Kaminski,
demonstrating a major vulnerability in DNS caching servers and
elaborating the manifold types of cyber attacks that can result from
this vulnerability, raised the world-wide consciousness of the
criticality of the security of the DNS system. At this point in time
there is general agreement that the DNS security extension, DNSSEC,
constitutes the only complete solution to eradicating these problems.
Ron Aitchison, a recognized DNS expert and author[2] also points out
another, perhaps not as widely publicized but important, advantage of
DNSSEC - that SSL for the first time becomes totally trustworthy.
DNSSEC has been defined for a considerable time. Its deployment has
been delayed because of high costs and complexity. NIST has published a
103 page deployment guide, elaborating the procedures and best
practices needed to implement it. We have been working for
some time with Professor Dan Massey,
CSU, one of the authors of the DNS security extensions[3]. The basic
problem with DNSSEC is that the authors knew the systems hosting DNS
application software were so vulnerable that they could not be trusted
to keep the signing keys on line. They had to manage the keys and do
the signing on systems that were offline. This is what led to the high
costs and complexity. While clumsy in all situations, it simply does
not work in others. There are firms, for example, that dynamically
create tens of new DNS entries per second. There is no effective way to
transport these dynamic DNS entries offline and back to get them signed
in a timely manner.
However, since Secure64’s SourceT micro operating system is
secure from the ground up, we can keep the keys online, and can protect
them with an Itanium memory protection key. Every plaintext key use is
kept in protected memory. Any key that is managed by the system but not
in protected memory is encrypted. When needed, we have a TPM
chip as the platform root of trust, that we also employ when
transporting root keys into protected memory.
Other than yourself, do you know of anyone making use of
Trusted Privacy Module (TPM), the tamper resistant chip to securely
store keys? It seems like it has been a long time coming? Most of us
have TPM chips in our laptops and desktops, and, yet, they are not used
for anything. Since MS Vista was not a commercial success, we never got
to see if BitLocker would get industry traction. What can you tell us
about TPM?
I am aware of a number of efforts in industry and government that use
or are planning to use a TPM. The TPM chip is a fundamental component
of Intel’s published security roadmap.[4] Many uses
of the TPM and a software stack built on top of it were specified over
several years without actually being implemented. There are lots of
specifications, and efforts now are beginning to implement elements of
these specifications. The hardware platform we presently use has
limited firmware support for the TPM, and our use of the TPM is very
basic. We use it only as a source of entropy and as the root of a
platform-unique tree of keys. This involves only the two required TPM
keys and a third key generated as the root of our key tree.
I understand that uses of the TPM are farther advanced in other
platforms, but there seem to be persistent tactical and logistics
problems that remain to be solved before the full use envisioned for
the TPM becomes a reality.
What is the fundamental problem with DNS Security?
Well, Stephen, today there is no way to know for certain that when you:
Send a DNS request to look up an IP address, or set of IP
addresses, for a URL, (or)
Get back a DNS response,
that the returned IP address is the actual IP address of the system you
wish to reach.
The security extensions to DNS build chains of trust so you get an both
an IP address, or set of IP addresses, and a digital signature that
authenticates the IP address(es).
Once we can implement this globally, then SSL actually becomes
trustworthy. Currently, there is the risk of a man in the middle
attack. Once you have the security extensions, that can no longer
happen.
OK, you built a better mousetrap, how is it selling?
We've received very high levels of interest in this product. The US
government has made decisions to sign important zones by December 2009.
There is also high interest in Europe. We have talked with some
financial institutions whose auditors have concluded that they must
implement DNSSEC, and we have a product that makes this practical and
reasonable.
What do you think is the next great application for this
technology?
The DNS system is a global reliable, available, redundant distributed
database. Once you add security, you also have the component of trust.
All of the information in the system can be supported by digital
signatures. What other things might one keep in such a
database? We view the Secure DNS structure as a building block
for other applications, and for solving other problems.
What do you think about the micro OS and TPM as a
security solution for UDDI for SOA?
Actually, back in the early 90's, at HP Labs we postulated various
ideas of software as a service. HP chose not to pursue these ideas in
order not to offend major partner companies. However, the basic ideas
all postulated highly secure and trustworthy system elements as
essential components. We concluded that pervasive, highly-distributed
and highly-trusted networked systems would flourish if there were
strongly secured architectures in core elements.
Sometimes one must rethink problems, and fresh starts at the foundation
sometimes cannot be avoided. In my view, the US needs to rebuild much
of our national and industrial cyber infrastructure from the very
foundation - with security as the overarching objective. Only the
functions needed, no more. I think this has got to happen eventually,
and I honestly don’t think we can reach such a point by
evolving today’s popular and widely deployed systems.
I see continuing concerns and fear in trusting too much to
today’s networks and systems. There is mounting evidence for
concern nearly every day. We see White House computers hacked, as well
as cyber problems for both presidential campaigns. We need to undertake
the research to rethink the foundations.
Thank you so much Bill; can we ask you to share just a
bit about yourself, what do you like to do when you are not behind a
computer?
I enjoy playing the piano. I have been studying and playing since I was
seven years old, and I still take lessons from a fine musician in
Colorado. I enjoy reading, studying new languages, and skiing (although
my back is no longer in the best shape anymore). I also enjoy spending
time in the mountains and time with my kids and grandkids.
===
1 Wall Street Journal Digital Network, Market Watch, Oct 29, 2008
2 Ron Aitchison, "Pro DNS and BIND," Apress, 2005
3 SP800-81
4 David Grawrock, "The Intel Safer Computing Initiative," Intel Press,
2006