Ticket #1113 (closed task: wontfix)

Opened 2 years ago

Last modified 8 months ago

Using PIE by default on AMD64

Reported by: halfie Owned by:
Priority: major Keywords: meeting
Cc: jakub Blocked By:
Blocking:

Description

http://fedoraproject.org/wiki/Hardened_Packages page mentions that "FESCo requires some packages to use PIE and relro hardening by default."

I am proposing that hardening flags (including PIE and RELRO) should be turned on by *default* for *all* packages on AMD64.

  1. https://wiki.ubuntu.com/Security/Features says "PIE on x86_64 does not have the same penalties, and will eventually be made the default, but more testing is required."

What about Fedora taking the lead on this one?

  1. http://www.openbsd.org/53.html says,

"Position-independent executables (PIE) are now used by default on alpha, amd64, hppa, landisk, loongson, sgi and sparc64."

I wish Fedora did the same on AMD64.

  1. Addressing concerns mentioned in https://fedorahosted.org/fesco/ticket/1104

"PIE disables use of prelink - so this is another performance impact on startup. On the other hand we should evaluate the impact of non-prelinked vs. prelinked startup time on modern computers, maybe it is no longer much relevant"

Please see http://people.redhat.com/~gmurphy/files/pie.odt for such an evaluation.

In short. "... the average delay of a PIE application over a non-PIE application was significantly below perceivable threshold."

"I guess PIE has some impact on performance. Therefore I'd rather use PIE on limited list of packages. Databases might be a good addition into the current group."

ftp://ftp.inf.ethz.ch/doc/tech-reports/7xx/766.pdf mentions an average overhead of 3.6% on AMD64 (x64) which is not too bad (considering the benefits it provides).

I was able to independently verify some of the numbers present in this paper by using "unSPEC" (https://github.com/kholia/unSPEC).

FWIW, Ubuntu has been shipping PIE enabled Firefox for years now. https://bugs.launchpad.net/ubuntu/+source/xulrunner-1.9.1/+bug/507744 I repeated the benchmarks (mentioned in the above bug report) for Firefox 20.0 running on Fedora 18 64-bit.

http://dromaeo.com/?id=193034,193041,193043,193080,193080,193081,193082
First four columns are stock Firefox and last two columns are PIE enabled Firefox. There are no performance regressions it seems (at least not in the Dromaeo JavaScript? performance testing tool). Additionally, there was no perceivable difference in the load time of Firefox.

I recommend running your own independent benchmarks to confirm this.

Similarly, there are no performance regressions (after enabling PIE) even in popular CPU intensive applications like Gimp and MongoDB.

Furthermore, large programs do the bulk of their computations in application-specific DSOs already. This applies to Firefox, among others. httpd is another example which already uses zlib and OpenSSL DSOs to do (CPU intensive) operations like compression and encryption.

"I'm not sure that in the study above the non-PIE binaries were prelinked or not. Also it would be more interesting to see the result for bigger desktop applications like LibreOffice?, Firefox, Evolution."

http://en.wikipedia.org/wiki/Prelink mentions "Jakub Jelínek points out that position independent executables ignore prelinking on Red Hat Enterprise Linux and Fedora Core, and recommends that network and SUID programs be built PIE to facilitate a more secure environment."

In short, "... the average delay of a PIE application over a non-PIE application was significantly below perceivable threshold". Please note that "position independent executables ignore prelinking on Red Hat Enterprise Linux and Fedora Core", so the numbers presented in Grant's report should be good as they are."

I can do more benchmarks and analysis if required. What else do you think is needed from my side to move this ticket forward?

Change History

comment:1 Changed 2 years ago by pwouters

Removal of prelink would also benefit FIPS mode, and get rid of problems were prelinking causes FIPS mode failures on crypto libraries.

In the past, I have advocated for automatic prelink -ua when booted with fips=1, and the only argument against that was "loss of speed". If these numbers prove that speed issues are marginal, I'm in favour or removal prelink altogher and using full relro/pie per default for everything.

comment:2 Changed 2 years ago by notting

  • Keywords meeting added

Apologies on not getting this on the meeting schedule. Scheduling it for next week's meeting.

comment:3 Changed 2 years ago by mitr

  • Cc jakub added

Jakub, what do you think about the proposal?

comment:4 Changed 2 years ago by mitr

For the record, a reply from Jakub:

I think it is a really very bad idea. PIE really isn't free, not even on AMD64, the cost is serious, and the advantages are really small for programs that aren't in the current hardened categories (network facing daemons, programs with bad security history, programs run with elevated priviledges).

comment:5 Changed 2 years ago by kevin

At the 2013-05-22 meeting:

Agreed: defer, ask Jakub/tools team for a test case that we can get numbers of where performance degrades and to what extent.

So, the kind of info we are looking for here is more real world. If we enabled this by default, can we see real world cases where "the cost is serious" in Fedora packages? Are there some package types that would be hit harder than others?

also, we would want an easy way for people to opt out of this.

comment:6 Changed 2 years ago by halfie

  1. Additionally, no performance regressions were observed in the PIE builds of beanstalkd, redis and MongoDB. (I have posted some of the benchmarking scripts on GitHub?, https://github.com/kholia/unSPEC).
  1. PIE builds of *some* 100% CPU bound programs like Crafty, Sjeng and Gzip are slower by 2% to 3.5%. SPEC suite uses these and other CPU bound programs, which are affected the most by hardening. I can include more applications in my benchmarks (suggestions are welcome!).
  1. Instead of maintaining a list of packages which need to be hardened, is it possible (and *better*) instead to maintain a list of packages which are exempt from "PIE as default on AMD64" rule?

This list can contain packages for which PIE is causing problems (application failures, performance problems). I am *very OK* with the idea of not having a 100% PIE system.

  1. I would also like to point out that a lot of "performance sensitive" applications like httpd, MySQL are already PIE and other apps like MongoDB and Samba will be PIE soon too.
Last edited 2 years ago by halfie (previous) (diff)

comment:7 Changed 2 years ago by halfie

Lot of folks seem to be "interested" in prelink stuff for various reasons. In my (limited) testing, I have found that even for large Desktop applications, like LibreOffice?, enabling prelink doesn't provide "large" speedups.

E.g. enabling prelink results in LibreOffice? starting only a little bit faster (250 ms in the best case). I am running 2009 class hardware. On latest hardware, this 250ms figure will / should reduce further.

I am not sure if anyone can notice (or cares about) this 250ms speedup in starting LibreOffice? (which is already infamous for its huge start-up time).

(Hopefully, I haven't screwed up this benchmark!)

comment:8 Changed 2 years ago by halfie

Posted new audio encoding benchmarks on https://github.com/kholia/unSPEC

In short,

There seem to be no measurable performance regressions in LAME and FLAC packages.

(maybe I need a machine with lesser noise to measure the difference!)

Maintainers / Packagers,

If you want a particular package benchmarked, please let me know.

comment:9 follow-up: ↓ 12 Changed 2 years ago by mitr

2 questions:

(For those that think PIE should be default): Can we get PIE without eliminating the benefits of prelink? Prelink is not as useful as PIE for security, and the 250 ms delay mentioned in comment:7 is rather large.

(For those that think PIE shouldn't be default): What kinds of applications would be particularly hurt by enabling PIE? Are there too many of them for the maintainers to manually disable PIE? AFAICS the necessary conditions are:

  1. Built in the distribution
  2. CPU-bound (or CPU-limited in the primary performance metric)
  3. Not required use PIE already (= not running as root, not a daemon)

It seems that most user-facing applications don't qualify due to "2."

comment:10 follow-ups: ↓ 11 ↓ 20 Changed 2 years ago by jakub

I'd reiterate that the advantages of address space randomization on x86-64 are grossly exaggerated, given that the first 6 arguments are passed in registers (thus it is harder to construct the right arguments compared to say i?86 where just some stack buffer overflow can lead to specifying both where to return to and what arguments should be passed to it) and because address space on x86-64 is never really randomized anyway (due to historical mistake, the vsyscall page is not randomized, so you can always return to vsyscall page if you can't easily return to libc or some other library).

I'll ask my collegue to run SPEC2k6 to get real numbers, but e.g. on a tiny test like: test1.c:

#define X(n) \
extern unsigned int a##n; \
\
void \
foo##n (void) \
{ \
  a##n++; \
}
#define Y(n) X(n##0) X(n##1) X(n##2) X(n##3) X(n##4) X(n##5) X(n##6) X(n##7) X(n##8) X(n##9)
#define Z(n) Y(n##0) Y(n##1) Y(n##2) Y(n##3) Y(n##4) Y(n##5) Y(n##6) Y(n##7) Y(n##8) Y(n##9)
Z(1)

test2.c:

#define X(n) \
unsigned int a##n; \
extern void foo##n (void);
#define Y(n) X(n##0) X(n##1) X(n##2) X(n##3) X(n##4) X(n##5) X(n##6) X(n##7) X(n##8) X(n##9)
#define Z(n) Y(n##0) Y(n##1) Y(n##2) Y(n##3) Y(n##4) Y(n##5) Y(n##6) Y(n##7) Y(n##8) Y(n##9)
Z(1)
#undef X
int
main ()
{
  int i;
  for (i = 0; i < 10000000; i++)
    {
      #define X(n) foo##n ();
      Z(1)
    }
  return 0;
}

with -O2 -o test test{1,2}.c vs. -O2 -o testpie test{1,2}.c -fpie -pie the difference is about 21% on my box (still with tons of caching when the loop does the same millions of times). PIE code is bigger, has both larger I-cache and D-cache footprint, so program will eat more memory (part of that per-process rather than per-program), trash more caches etc. When we've designed PIE, it was never meant to be used for everything.

Don't understand the FIPS related code, FIPS checksumming is a joke, because it verifies a few libraries, but when it doesn't start with verification of the dynamic linker, libc, libdl, that verification is completely useless, and isn't run in the common case. I have no problem if people running FIPS mode disable prelink, though there is a way to do the useless checksumming even with prelink around.

comment:11 in reply to: ↑ 10 ; follow-up: ↓ 13 Changed 2 years ago by halfie

Replying to jakub:

I'll ask my collegue to run SPEC2k6 to get real numbers, but e.g. on a tiny test like:

Already done, see ​ftp://ftp.inf.ethz.ch/doc/tech-reports/7xx/766.pdf

""" A quick evaluation for x64 reports an average overhead of 3.61% and a geometric mean of 2.34% for an -O3 opti- mization level on the same system using the “test” dataset of SPEC CPU2006. """

And (almost) all SPEC CPU2006 programs are CPU bound. So they should be impacted most by PIE.

Even in normal CPU intensive tasks like encoding audio, I couldn't measure the performance impact of PIE.

It is always possible to trigger "bad edge" cases (like you have done). We need to figure out (collectively) if such problems exists in existing "real-world" packages we have in Fedora (or upstream).

comment:12 in reply to: ↑ 9 Changed 2 years ago by halfie

Replying to mitr:

2 questions:

(For those that think PIE should be default): Can we get PIE without eliminating the benefits of prelink? Prelink is not as useful as PIE for security, and the 250 ms delay mentioned in comment:7 is rather large.

I don't think so.

If I understand correctly, prelink is an optimization which happens to be anti-security. Not complaining here ;)

(For those that think PIE shouldn't be default): What kinds of applications would be particularly hurt by enabling PIE? Are there too many of them for the maintainers to manually disable PIE? AFAICS the necessary conditions are:

  1. Built in the distribution
  2. CPU-bound (or CPU-limited in the primary performance metric)
  3. Not required use PIE already (= not running as root, not a daemon)

It seems that most user-facing applications don't qualify due to "2."

Even if there are lot of type "2" applications, they might be doing the CPU intensive work in library calls. E.g. LAME using libmp3lame.so.0 is one example. So we have nothing (more) to lose for these types of applications by enabling PIE.

...

My primary concern at this point is to identify applications for which PIE will cause problems. Ideally, I want to estimate the size of the list which will contain packages for which PIE *should* be disabled. I don't want that list to be too big to become a maintenance problem.

Last edited 2 years ago by halfie (previous) (diff)

comment:13 in reply to: ↑ 11 Changed 2 years ago by jakub

Replying to halfie:

Replying to jakub:

I'll ask my collegue to run SPEC2k6 to get real numbers, but e.g. on a tiny test like:

Already done, see ​ftp://ftp.inf.ethz.ch/doc/tech-reports/7xx/766.pdf

""" A quick evaluation for x64 reports an average overhead of 3.61% and a geometric mean of 2.34% for an -O3 opti- mization level on the same system using the “test” dataset of SPEC CPU2006. """

That is with -O3 rather than -O2 that is used in the distro etc., what we really want to see is comparison of the compilation flags the distro actually uses (-O2 -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 etc.) vs. the same with -pie -fpie, anyway, 3.61% is a significant slowdown and several years of compiler improvements. You haven't responded at all to the comments about overestimation of address space randomization on security, it is just one of many factors, and definitely not the most important one.

Talking about prelink as anti-security optimization is just ridiculous, the most important for security is properly written code, then measures that prevent exploits (-D_FORTIFY_SOURCE=2, -fsanitize=address, -fmudflap, SELinux), and only then measures that make exploits less likely to succeed (address space randomization).

For the hardened category I'm all for increasing security measures, be it -fstack-protector-strong, or other ways I'm not allowed to talk about just yet.

comment:14 Changed 2 years ago by tmraz

  • Resolution set to wontfix
  • Status changed from new to closed

The proposal was rejected for now on the today's FESCo meeting.

  • proposal is rejected (+2 -5 0:2)

comment:15 Changed 19 months ago by halfie

http://seclists.org/oss-sec/2014/q1/356

time to file another ticket to enable PIE by default on AMD64?

comment:16 Changed 19 months ago by mitr

I can't see any new information ("I expect $number performance impact" alone is not really information) in that email, so why should FESCo decide differently when given the same options?

Or is this a proposal to mimic Windows and move to text relocation at load time?

comment:17 Changed 10 months ago by halfie

I was reading https://lwn.net/Articles/620191/ (LWN subscriber-only content, code execution bugs in strings command) and it got me thinking about this ticket again.

Even mobile phones (with heavy power and performance constraints) *require* PIE these days.

I can dedicate some personal time to re-submitting (or re-opening) this ticket and moving it forward, if you guys think that the "right time" has arrived.

comment:18 Changed 10 months ago by mitr

Looking at http://meetbot.fedoraproject.org/fedora-meeting/2013-05-29/fesco.2013-05-29-18.01.log.html the decisive factor was objections of Jakub. Has that changed?

I’m not sure that “time” has much to do with it. OTOH time has flown since, prelink is gone by default (though again that was not really the decisive factor) and various stakeholders have been replaced. So I wouldn’t completely rule out a reopened discussion with no few new arguments to arrive at a different conclusion, but I would really not bet on it either.

(Ceterum autem censeo C linguam esse delendam.)

comment:19 Changed 10 months ago by jakub

I don't think anything has changed, I still strongly object against that.

comment:20 in reply to: ↑ 10 Changed 8 months ago by mitr

(Reviewing old discussions, just a minor point for the record and future discussions)

Replying to jakub:

I'd reiterate that the advantages of address space randomization on x86-64 are grossly exaggerated, given that the first 6 arguments are passed in registers (thus it is harder to construct the right arguments compared to say i?86 where just some stack buffer overflow can lead to specifying both where to return to and what arguments should be passed to it)

Putting the right values into the right registers is a routine problem with a routine solution, finding “ROP gadgets” (e.g. pop rsi; ret) in the address space of the victim, and chaining calls through a few such gadgets to set all values as necessary.

Address space randomization (and hoping for no leaks in program output) is one of the very few conceptual mitigations we have for the ~inevitable memory misuses in programs written in C-like languages. It would be reasonable not to force PIE use and the associated performance penalty for memory-safe languages, though (if we have any such compiled languages in wide enough use to worry about that at all).

Last edited 8 months ago by mitr (previous) (diff)
Note: See TracTickets for help on using tickets.