#1113 Using PIE by default on AMD64
Closed None Opened 10 years ago by halfie.

http://fedoraproject.org/wiki/Hardened_Packages page mentions that "FESCo
requires some packages to use PIE and relro hardening by default."

I am proposing that hardening flags (including PIE and RELRO) should be turned on
by default for all packages on AMD64.

  1. https://wiki.ubuntu.com/Security/Features says "PIE on x86_64 does not have
    the same penalties, and will eventually be made the default, but more testing
    is required."

What about Fedora taking the lead on this one?

  1. http://www.openbsd.org/53.html says,

"Position-independent executables (PIE) are now used by default on alpha,
amd64, hppa, landisk, loongson, sgi and sparc64."

I wish Fedora did the same on AMD64.

  1. Addressing concerns mentioned in https://fedorahosted.org/fesco/ticket/1104

"PIE disables use of prelink - so this is another performance impact on
startup. On the other hand we should evaluate the impact of non-prelinked
vs. prelinked startup time on modern computers, maybe it is no longer much
relevant"

Please see http://people.redhat.com/~gmurphy/files/pie.odt for such an
evaluation.

In short. "... the average delay of a PIE application over a non-PIE
application was significantly below perceivable threshold."

"I guess PIE has some impact on performance. Therefore I'd rather use PIE on
limited list of packages. Databases might be a good addition into the current
group."

ftp://ftp.inf.ethz.ch/doc/tech-reports/7xx/766.pdf mentions an average
overhead of 3.6% on AMD64 (x64) which is not too bad (considering the
benefits it provides).

I was able to independently verify some of the numbers present in this
paper by using "unSPEC" (https://github.com/kholia/unSPEC).

FWIW, Ubuntu has been shipping PIE enabled Firefox for years now.
https://bugs.launchpad.net/ubuntu/+source/xulrunner-1.9.1/+bug/507744 I
repeated the benchmarks (mentioned in the above bug report) for Firefox 20.0
running on Fedora 18 64-bit.[[BR]]

http://dromaeo.com/?id=193034,193041,193043,193080,193080,193081,193082
[[BR]]
First four columns are stock Firefox and last two columns are PIE enabled Firefox. There are no performance regressions it seems (at least not in the Dromaeo
JavaScript performance testing tool). Additionally, there was no perceivable
difference in the load time of Firefox.

I recommend running your own independent benchmarks to confirm this.

Similarly, there are no performance regressions (after enabling PIE) even in
popular CPU intensive applications like Gimp and MongoDB.

Furthermore, large programs do the bulk of their computations in
application-specific DSOs already. This applies to Firefox, among others.
httpd is another example which already uses zlib and OpenSSL DSOs to do
(CPU intensive) operations like compression and encryption.

"I'm not sure that in the study above the non-PIE binaries were prelinked or
not. Also it would be more interesting to see the result for bigger desktop
applications like LibreOffice, Firefox, Evolution."

http://en.wikipedia.org/wiki/Prelink mentions "Jakub Jelínek points out that
position independent executables ignore prelinking on Red Hat Enterprise
Linux and Fedora Core, and recommends that network and SUID programs be built
PIE to facilitate a more secure environment."

In short, "... the average delay of a PIE application over a non-PIE
application was significantly below perceivable threshold". Please note that
"position independent executables ignore prelinking on Red Hat Enterprise
Linux and Fedora Core", so the numbers presented in Grant's report should be good as they are."

I can do more benchmarks and analysis if required. '''What else do you think is needed from my side to move this ticket forward?'''


Removal of prelink would also benefit FIPS mode, and get rid of problems were prelinking causes FIPS mode failures on crypto libraries.

In the past, I have advocated for automatic prelink -ua when booted with fips=1, and the only argument against that was "loss of speed". If these numbers prove that speed issues are marginal, I'm in favour or removal prelink altogher and using full relro/pie per default for everything.

Apologies on not getting this on the meeting schedule. Scheduling it for next week's meeting.

Jakub, what do you think about the proposal?

For the record, a reply from Jakub:

I think it is a really very bad idea. PIE really isn't free, not even on
AMD64, the cost is serious, and the advantages are really small for
programs that aren't in the current hardened categories (network facing
daemons, programs with bad security history, programs run with elevated
priviledges).

At the 2013-05-22 meeting:

Agreed: defer, ask Jakub/tools team for a test case that we can get numbers of where performance degrades and to what extent.

So, the kind of info we are looking for here is more real world. If we enabled this by default, can we see real world cases where "the cost is serious" in Fedora packages? Are there some package types that would be hit harder than others?

also, we would want an easy way for people to opt out of this.

  1. Additionally, no performance regressions were observed in the PIE builds of beanstalkd, redis and MongoDB. (I have posted some of the benchmarking scripts on GitHub, https://github.com/kholia/unSPEC).

  2. PIE builds of some 100% CPU bound programs like Crafty, Sjeng and Gzip are slower by 2% to 3.5%. SPEC suite uses these and other CPU bound programs, which are affected the most by hardening. I can include more applications in my benchmarks (suggestions are welcome!).

  3. Instead of maintaining a list of packages which need to be hardened, is it possible (and better) instead to maintain a list of packages which are exempt from "PIE as default on AMD64" rule?

This list can contain packages for which PIE is causing problems (application failures, performance problems). I am very OK with the idea of not having a 100% PIE system.

  1. I would also like to point out that a lot of "performance sensitive" applications like httpd, MySQL are already PIE and other apps like MongoDB and Samba will be PIE soon too.

Lot of folks seem to be "interested" in prelink stuff for various reasons. In my (limited) testing,
I have found that even for large Desktop applications, like LibreOffice, enabling prelink doesn't
provide "large" speedups.

E.g. enabling prelink results in LibreOffice starting only a little bit faster (250 ms in the best
case). I am running 2009 class hardware. On latest hardware, this 250ms figure will / should reduce further.

I am not sure if anyone can notice (or cares about) this 250ms speedup in starting LibreOffice (which is
already infamous for its huge start-up time).

(Hopefully, I haven't screwed up this benchmark!)

Posted new audio encoding benchmarks on https://github.com/kholia/unSPEC

In short,

There seem to be no measurable performance regressions in LAME and FLAC packages.

(maybe I need a machine with lesser noise to measure the difference!)

Maintainers / Packagers,

If you want a particular package benchmarked, please let me know.

2 questions:

(For those that think PIE should be default): Can we get PIE without eliminating the benefits of prelink? Prelink is not as useful as PIE for security, and the 250 ms delay mentioned in comment:7 is rather large.

(For those that think PIE shouldn't be default): What kinds of applications would be particularly hurt by enabling PIE? Are there too many of them for the maintainers to manually disable PIE? AFAICS the necessary conditions are:
1. Built in the distribution
1. CPU-bound (or CPU-limited in the primary performance metric)
1. Not required use PIE already (= not running as root, not a daemon)

It seems that most user-facing applications don't qualify due to "2."

I'd reiterate that the advantages of address space randomization on x86-64 are grossly exaggerated, given that the first 6 arguments are passed in registers (thus it is harder to construct the right arguments compared to say i?86 where just some stack buffer overflow can lead to specifying both where to return to and what arguments should be passed to it) and because address space on x86-64 is never really randomized anyway (due to historical mistake, the vsyscall page is not randomized, so you can always return to vsyscall page if you can't easily return to libc or some other library).

I'll ask my collegue to run SPEC2k6 to get real numbers, but e.g. on a tiny test like:
test1.c:
{{{

define X(n) \

extern unsigned int a##n; \
\
void \
foo##n (void) \
{ \
a##n++; \
}

define Y(n) X(n##0) X(n##1) X(n##2) X(n##3) X(n##4) X(n##5) X(n##6) X(n##7) X(n##8) X(n##9)

define Z(n) Y(n##0) Y(n##1) Y(n##2) Y(n##3) Y(n##4) Y(n##5) Y(n##6) Y(n##7) Y(n##8) Y(n##9)

Z(1)
}}}
test2.c:
{{{

define X(n) \

unsigned int a##n; \
extern void foo##n (void);

define Y(n) X(n##0) X(n##1) X(n##2) X(n##3) X(n##4) X(n##5) X(n##6) X(n##7) X(n##8) X(n##9)

define Z(n) Y(n##0) Y(n##1) Y(n##2) Y(n##3) Y(n##4) Y(n##5) Y(n##6) Y(n##7) Y(n##8) Y(n##9)

Z(1)

undef X

int
main ()
{
int i;
for (i = 0; i < 10000000; i++)
{
#define X(n) foo##n ();
Z(1)
}
return 0;
}
}}}
with -O2 -o test test{1,2}.c vs. -O2 -o testpie test{1,2}.c -fpie -pie the difference is about 21% on my box (still with tons of caching when the loop does the same millions of times). PIE code is bigger, has both larger I-cache and D-cache footprint, so program will eat more memory (part of that per-process rather than per-program), trash more caches etc.
When we've designed PIE, it was never meant to be used for everything.

Don't understand the FIPS related code, FIPS checksumming is a joke, because it verifies a few libraries, but when it doesn't start with verification of the dynamic linker, libc, libdl, that verification is completely useless, and isn't run in the common case. I have no problem if people running FIPS mode disable prelink, though there is a way to do the useless checksumming even with prelink around.

Replying to [comment:10 jakub]:

I'll ask my collegue to run SPEC2k6 to get real numbers, but e.g. on a tiny test like:

Already done, see ​ftp://ftp.inf.ethz.ch/doc/tech-reports/7xx/766.pdf

"""
A quick evaluation for x64 reports an average overhead
of 3.61% and a geometric mean of 2.34% for an -O3 opti-
mization level on the same system using the “test” dataset of
SPEC CPU2006.
"""

And (almost) all SPEC CPU2006 programs are CPU bound. So they should be impacted most by PIE.

Even in normal CPU intensive tasks like encoding audio, I couldn't measure the performance impact of PIE.

It is always possible to trigger "bad edge" cases (like you have done). We need to figure out (collectively) if such problems exists in existing "real-world" packages we have in Fedora (or upstream).

Replying to [comment:9 mitr]:

2 questions:

(For those that think PIE should be default): Can we get PIE without eliminating the benefits of prelink? Prelink is not as useful as PIE for security, and the 250 ms delay mentioned in comment:7 is rather large.

I don't think so.

If I understand correctly, prelink is an optimization which happens to be anti-security. Not complaining here ;)

(For those that think PIE shouldn't be default): What kinds of applications would be particularly hurt by enabling PIE? Are there too many of them for the maintainers to manually disable PIE? AFAICS the necessary conditions are:
1. Built in the distribution
2. CPU-bound (or CPU-limited in the primary performance metric)
3. Not required use PIE already (= not running as root, not a daemon)

It seems that most user-facing applications don't qualify due to "2."

Even if there are lot of type "2" applications, they might be doing the CPU intensive work in library calls. E.g. LAME using libmp3lame.so.0 is one example. So we have nothing (more) to lose for these types of applications by enabling PIE.

...

My primary concern at this point is to identify applications for which PIE will cause problems. Ideally, I want to estimate the size of the list which will contain packages for which PIE should be disabled. I don't want that list to be too big to become a maintenance problem.

Replying to [comment:11 halfie]:

Replying to [comment:10 jakub]:

I'll ask my collegue to run SPEC2k6 to get real numbers, but e.g. on a tiny test like:

Already done, see ​ftp://ftp.inf.ethz.ch/doc/tech-reports/7xx/766.pdf

"""
A quick evaluation for x64 reports an average overhead
of 3.61% and a geometric mean of 2.34% for an -O3 opti-
mization level on the same system using the “test” dataset of
SPEC CPU2006.
"""

That is with -O3 rather than -O2 that is used in the distro etc., what we really want to see is comparison of the compilation flags the distro actually uses (-O2 -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 etc.) vs. the same with -pie -fpie, anyway, 3.61% is a significant slowdown and several years of compiler improvements. You haven't responded at all to the comments about overestimation of address space randomization on security, it is just one of many factors, and definitely not the most important one.

Talking about prelink as anti-security optimization is just ridiculous, the most important for security is properly written code, then measures that prevent exploits (-D_FORTIFY_SOURCE=2, -fsanitize=address, -fmudflap, SELinux), and only then measures that make exploits less likely to succeed (address space randomization).

For the hardened category I'm all for increasing security measures, be it -fstack-protector-strong, or other ways I'm not allowed to talk about just yet.

The proposal was rejected for now on the today's FESCo meeting.
* proposal is rejected (+2 -5 0:2)

http://seclists.org/oss-sec/2014/q1/356

time to file another ticket to enable PIE by default on AMD64?

I can't see any new information ("I expect $number performance impact" alone is not really information) in that email, so why should FESCo decide differently when given the same options?

Or is this a proposal to mimic Windows and move to text relocation at load time?

I was reading https://lwn.net/Articles/620191/ (LWN subscriber-only content, code execution bugs in strings command) and it got me thinking about this ticket again.

Even mobile phones (with heavy power and performance constraints) require PIE these days.

I can dedicate some personal time to re-submitting (or re-opening) this ticket and moving it forward, if you guys think that the "right time" has arrived.

Looking at http://meetbot.fedoraproject.org/fedora-meeting/2013-05-29/fesco.2013-05-29-18.01.log.html the decisive factor was objections of Jakub. Has that changed?

I’m not sure that “time” has much to do with it. OTOH time has flown since, prelink is gone by default (though again that was not really the decisive factor) and various stakeholders have been replaced. So I wouldn’t completely rule out a reopened discussion with no few new arguments to arrive at a different conclusion, but I would really not bet on it either.

(Ceterum autem censeo C linguam esse delendam.)

I don't think anything has changed, I still strongly object against that.

(Reviewing old discussions, just a minor point for the record and future discussions)

Replying to [comment:10 jakub]:

I'd reiterate that the advantages of address space randomization on x86-64 are grossly exaggerated, given that the first 6 arguments are passed in registers (thus it is harder to construct the right arguments compared to say i?86 where just some stack buffer overflow can lead to specifying both where to return to and what arguments should be passed to it)

Putting the right values into the right registers is a routine problem with a routine solution, finding “ROP gadgets” (e.g. pop rsi; ret) in the address space of the victim, and chaining calls through a few such gadgets to set all values as necessary.

Address space randomization (and hoping for no leaks in program output) is one of the very few conceptual mitigations we have for the ~inevitable memory misuses in programs written in C-like languages. It would be reasonable not to force PIE use and the associated performance penalty for memory-safe languages, though (if we have any such compiled languages in wide enough use to worry about that at all).

Login to comment on this ticket.

Metadata