AbrtRetraceServer: retrace-server

File retrace-server, 33.4 KB (added by kklic, 3 years ago)

Design document in text format

Line 
1======================================================================
2Retrace server design
3======================================================================
4
5The retrace server provides a coredump analysis and backtrace
6generation service over a network using HTTP protocol.
7
8----------------------------------------------------------------------
9Contents
10----------------------------------------------------------------------
11
121. Overview
132. HTTP interface
14  2.1 Creating a new task
15  2.2 Task status
16  2.3 Requesting a backtrace
17  2.4 Requesting a log file
18  2.5 Task cleanup
19  2.6 Limiting traffic
203. Retrace worker
214. Package repository
225. Traffic and load estimation
236. Security
24  6.1 Clients
25  6.2 Packages and debuginfo
267. Future work
27
28----------------------------------------------------------------------
291. Overview
30----------------------------------------------------------------------
31
32A client sends a coredump (created by Linux kernel) together with some
33additional information to the server, and gets a backtrace generation
34task ID in response. Then the client, after some time, asks the server
35for the task status, and when the task is done (backtrace has been
36generated from the coredump), the client downloads the backtrace. If
37the backtrace generation fails, the client gets an error code and
38downloads a log indicating what happened. Alternatively, the client
39sends a coredump, and keeps receiving the server response
40message. Server then, via the response's body, periodically sends
41status of the task, and delivers the resulting backtrace as soon as
42it's ready.
43
44The retrace server must be able to support multiple operating systems
45and their releases (Fedora N-1, N, Rawhide, Branched Rawhide, RHEL),
46and multiple architectures within a single installation.
47
48The retrace server consists of the following parts:
491. abrt-retrace-server: a HTTP interface script handling the
50   communication with clients, task creation and management
512. abrt-retrace-worker: a program doing the environment preparation
52   and coredump processing
533. package repository: a repository placed on the server containing
54   all the application binaries, libraries, and debuginfo necessary
55   for backtrace generation
56
57----------------------------------------------------------------------
582. HTTP interface
59----------------------------------------------------------------------
60
61The HTTP interface application is a script written in Python. The
62script is named abrt-retrace-server, and it uses the Python Web Server
63Gateway Interface (WSGI, http://www.python.org/dev/peps/pep-0333/) to
64interact with the web server.  Administrators may use mod_wsgi
65(http://code.google.com/p/modwsgi/) to run abrt-retrace-server on
66Apache. The mod_wsgi is a part of both Fedora 12 and RHEL 6. The
67Python language is a good choice for this application, because it
68supports HTTP handling well, and it is already used in ABRT.
69
70Only secure (HTTPS) communication must be allowed for the
71communication with abrt-retrace-server, because coredumps and
72backtraces are private data. Users may decide to publish their
73backtraces in a bug tracker after reviewing them, but the retrace
74server doesn't do that. The HTTPS requirement must be specified in the
75server's man page. The server must support HTTP persistent connections
76to to avoid frequent SSL renegotiations. The server's manual page
77should include a recommendation for administrator to check that the
78persistent connections are enabled.
79
80----------------------------------------------------------------------
812.1 Creating a new task
82----------------------------------------------------------------------
83
84A client might create a new task by sending a HTTP request to the
85https://server/create URL, and providing an archive as the request
86content. The archive must contain crash data files. The crash data
87files are a subset of the local /var/spool/abrt/ccpp-time-pid/
88directory contents, so the client must only pack and upload them.
89
90The server must support uncompressed tar archives, and tar archives
91compressed with gzip and xz. Uncompressed archives are the most
92efficient way for local network delivery, and gzip can be used there
93as well because of its good compression speed.
94
95The xz compression file format is well suited for public server setup
96(slow network), as it provides good compression ratio, which is
97important for compressing large coredumps, and it provides reasonable
98compress/decompress speed and memory consumption (see the chapter '5
99Traffic and load estimation' for the measurements). The XZ Utils
100implementation with the compression level 2 should be used to compress
101the data.
102
103The HTTP request for a new task must use the POST method. It must
104contain a proper 'Content-Length' and 'Content-Type' fields. If the
105method is not POST, the server must return the "405 Method Not
106Allowed" HTTP error code. If the 'Content-Length' field is missing,
107the server must return the "411 Length Required" HTTP error code. If a
108'Content-Type' other than 'application/x-tar', 'application/x-gzip',
109'application/x-xz' is used, the server must return the "415
110unsupported Media Type" HTTP error code. If the 'Content-Length' value
111is greater than a limit set in the server configuration file (30 MB by
112default), or the real HTTP request size gets larger than the limit +
11310 KB for headers, then the server must return the "413 Request Entity
114Too Large" HTTP error code, and provide an explanation, including the
115limit, in the response body. The limit must be changeable from the
116server configuration file.
117
118If there is less than 20 GB of free disk space in the
119/var/spool/abrt-retrace directory, the server must return the "507
120Insufficient Storage" HTTP error code. The server must return the same
121HTTP error code if decompressing the received archive would cause the
122free disk space to become less than 20 GB. The 20 GB limit must be
123changeable from the server configuration file.
124
125If the data from the received archive would take more than 500 MB of
126disk space when uncompressed, the server must return the "413 Request
127Entity Too Large" HTTP error code, and provide an explanation,
128including the limit, in the response body. The size limit must be
129changeable from the server configuration file. It can be set pretty
130high because coredumps, that take most disk space, are stored on the
131server only temporarily until the backtrace is generated. When the
132backtrace is generated the coredump is deleted by the
133abrt-retrace-worker, so most disk space is released.
134
135The uncompressed data size for xz archives can be obtained by calling
136`xz --list file.tar.xz`. The '--list' option has been implemented only
137recently, so it might be necessary to implement a method to get the
138uncompressed data size by extracting the archive to the stdout, and
139counting the extracted bytes, and call this method if the '--list'
140doesn't work on the server. Likewise, the uncompressed data size for
141gzip archives can be obtained by calling `gzip --list file.tar.gz`.
142
143If an upload from a client succeeds, the server creates a new
144directory /var/spool/abrt-retrace/<id> and extracts the received
145archive into it. Then it checks that the directory contains all the
146required files, checks their sizes, and then sends a HTTP
147response. After that it spawns a subprocess with abrt-retrace-worker
148on that directory.
149
150To support multiple architectures, the retrace server needs a GDB
151package compiled separately for every supported target architecture
152(see the avr-gdb package in Fedora for an example). This is
153technically and economically better solution than using a standalone
154machine for every supported architecture and re-sending coredumps
155depending on client's architecture. However, GDB's support for using a
156target architecture different from the host architecture seems to be
157fragile. If it doesn't work, the QEMU user mode emulation should be
158tried as an alternative approach.
159
160The following files from the local crash directory are required to be
161present in the archive: coredump, architecture, release, packages
162(this one does not exist yet). If one or more files are not present in
163the archive, or some other file is present in the archive, the server
164must return the "403 Forbidden" HTTP error code. If the size of any
165file except the coredump exceeds 100 KB, the server must return the
166"413 Request Entity Too Large" HTTP error code, and provide an
167explanation, including the limit, in the response body. The 100 KB
168limit must be changeable from the server configuration file.
169
170If the file check succeeds, the server HTTP response must have the
171"201 Created" HTTP code. The response must include the following HTTP
172header fields:
173- "X-Task-Id" containing a new server-unique numerical task id
174- "X-Task-Password" containing a newly generated password, required to
175  access the result
176- "X-Task-Est-Time" containing a number of seconds the server
177  estimates it will take to generate the backtrace
178
179The 'X-Task-Password' is a random alphanumeric ([a-zA-Z0-9]) sequence
18022 characters long. 22 alphanumeric characters corresponds to 128 bit
181password, because [a-zA-Z0-9] = 62 characters, and 2^128 < 62^22. The
182source of randomness must be, directly or indirectly,
183/dev/urandom. The rand() function from glibc and similar functions
184from other libraries cannot be used because of their poor
185characteristics (in several aspects). The password must be stored to
186the /var/spool/abrt-retrace/<id>/password file, so passwords sent by a
187client in subsequent requests can be verified.
188
189The task id is intentionally not used as a password, because it is
190desirable to keep the id readable and memorable for
191humans. Password-like ids would be a loss when an user authentication
192mechanism is added, and server-generated password will no longer be
193necessary.
194
195The algorithm for the "X-Task-Est-Time" time estimation should take
196the previous analyses of coredumps with the same corresponding package
197name into account. The server should store simple history in a SQLite
198database to know how long it takes to generate a backtrace for certain
199package. It could be as simple as this: - initialization step one:
200"CREATE TABLE package_time (id INTEGER PRIMARY KEY AUTOINCREMENT,
201package, release, time)"; we need the 'id' for the database cleanup -
202to know the insertion order of rows, so the "AUTOINCREMENT" is
203important here; the 'package' is the package name without the version
204and release numbers, the 'release' column stores the operating system,
205and the 'time' is the number of seconds it took to generate the
206backtrace - initialization step two: "CREATE INDEX package_release ON
207package_time (package, release)"; we compute the time only for single
208package on single supported OS release per query, so it makes sense to
209create an index to speed it up - when a task is finished: "INSERT INTO
210package_time (package, release, time) VALUES ('??', '??', '??')"  - to
211get the average time: "SELECT AVG(time) FROM package_time WHERE
212package == '??' AND release == '??'"; the arithmetic mean seems to be
213sufficient here
214
215So the server knows that crashes from an OpenOffice.org package take
2165 minutes to process in average, and it can return the value 300
217(seconds) in the field. The client does not waste time asking about
218that task every 20 seconds, but the first status request comes after
219300 seconds. And even when the package changes (rebases etc.), the
220database provides good estimations after some time ('2.5 Task cleanup'
221chapter describes how the data are pruned).
222
223The server response HTTP body is generated and sent gradually as the
224task is performed. Client chooses either to receive the body, or
225terminate after getting all headers and ask for status and backtrace
226asynchronously.
227
228The server re-sends the output of abrt-retrace-worker (its stdout and
229stderr) to the response the body. In addition, a line with the task
230status is added in the form `X-Task-Status: PENDING` to the body every
2315 seconds. When the worker process ends, either FINISHED_SUCCESS or
232FINISHED_FAILURE status line is sent. If it's FINISHED_SUCCESS, the
233backtrace is attached after this line. Then the response body is
234closed.
235
236----------------------------------------------------------------------
2372.2 Task status
238----------------------------------------------------------------------
239
240A client might request a task status by sending a HTTP GET request to
241the https://someserver/<id> URL, where <id> is the numerical task id
242returned in the "X-Task-Id" field by https://someserver/create. If the
243<id> is not in the valid format, or the task <id> does not exist, the
244server must return the "404 Not Found" HTTP error code.
245
246The client request must contain the "X-Task-Password" field, and its
247content must match the password stored in the
248/var/spool/abrt-retrace/<id>/password file. If the password is not
249valid, the server must return the "403 Forbidden" HTTP error code.
250
251If the checks pass, the server returns the "200 OK" HTTP code, and
252includes a field "X-Task-Status" containing one of the following
253values: "FINISHED_SUCCESS", "FINISHED_FAILURE", "PENDING".
254
255The field contains "FINISHED_SUCCESS" if the file
256/var/spool/abrt-retrace/<id>/backtrace exists. The client might get
257the backtrace on the https://someserver/<id>/backtrace URL. The log
258can be downloaded from the https://someserver/<id>/log URL, and it
259might contain warnings about some missing debuginfos etc.
260
261The field contains "FINISHED_FAILURE" if the file
262/var/spool/abrt-retrace/<id>/backtrace does not exist, but the file
263/var/spool/abrt-retrace/<id>/retrace-log exists. The retrace-log file
264containing error messages can be downloaded by the client from the
265https://someserver/<id>/log URL.
266
267The field contains "PENDING" if neither file exists. The client should
268ask again after 10 seconds or later.
269
270----------------------------------------------------------------------
2712.3 Requesting a backtrace
272----------------------------------------------------------------------
273
274A client might request a backtrace by sending a HTTP GET request to
275the https://someserver/<id>/backtrace URL, where <id> is the numerical
276task id returned in the "X-Task-Id" field by
277https://someserver/create. If the <id> is not in the valid format, or
278the task <id> does not exist, the server must return the "404 Not
279Found" HTTP error code.
280
281The client request must contain the "X-Task-Password" field, and its
282content must match the password stored in the
283/var/spool/abrt-retrace/<id>/password file. If the password is not
284valid, the server must return the "403 Forbidden" HTTP error code.
285
286If the file /var/spool/abrt-retrace/<id>/backtrace does not exist, the
287server must return the "404 Not Found" HTTP error code.  Otherwise it
288returns the file contents, and the "Content-Type" field must contain
289"text/plain".
290
291----------------------------------------------------------------------
2922.4 Requesting a log
293----------------------------------------------------------------------
294
295A client might request a task log by sending a HTTP GET request to the
296https://someserver/<id>/log URL, where <id> is the numerical task id
297returned in the "X-Task-Id" field by https://someserver/create. If the
298<id> is not in the valid format, or the task <id> does not exist, the
299server must return the "404 Not Found" HTTP error code.
300
301The client request must contain the "X-Task-Password" field, and its
302content must match the password stored in the
303/var/spool/abrt-retrace/<id>/password file. If the password is not
304valid, the server must return the "403 Forbidden" HTTP error code.
305
306If the file /var/spool/abrt-retrace/<id>/retrace-log does not exist,
307the server must return the "404 Not Found" HTTP error code.  Otherwise
308it returns the file contents, and the "Content-Type" field must
309contain "text/plain".
310
311----------------------------------------------------------------------
3122.5 Task cleanup
313----------------------------------------------------------------------
314
315Tasks that were created more than 5 days ago must be deleted, because
316tasks occupy disk space (not so much space, because the coredumps are
317deleted after the retrace, and only backtraces and configuration
318remain). A shell script "abrt-retrace-clean" must check the creation
319time and delete the directories in /var/spool/abrt-retrace. It is
320supposed that the server administrator sets cron to call the script
321once a day. This assumption must be mentioned in the
322abrt-retrace-clean manual page.
323
324The database containing packages and processing times should also be
325regularly pruned to remain small and provide data quickly. The cleanup
326script should delete some rows for packages with too many entries:
327a. get a list of packages from the database: "SELECT DISTINCT package,
328   release FROM package_time"
329b. for every package, get the row count: "SELECT COUNT(*) FROM
330   package_time WHERE package == '??' AND release == '??'"
331c. for every package with the row count larger than 100, some rows
332   most be removed so that only the newest 100 rows remain in the
333   database:
334   - to get highest row id which should be deleted, execute "SELECT id
335     FROM package_time WHERE package == '??' AND release == '??' ORDER
336     BY id LIMIT 1 OFFSET ??", where the OFFSET is the total number of
337     rows for that single package minus 100
338   - then all the old rows can be deleted by executing "DELETE FROM
339     package_time WHERE package == '??' AND release == '??' AND id <=
340     ??"
341
342----------------------------------------------------------------------
3432.6 Limiting traffic
344----------------------------------------------------------------------
345
346The maximum number of simultaneously running tasks must be limited to
34720 by the server. The limit must be changeable from the server
348configuration file. If a new request comes when the server is fully
349occupied, the server must return the "503 Service Unavailable" HTTP
350error code.
351
352The archive extraction, chroot preparation, and gdb analysis is mostly
353limited by the hard drive size and speed.
354
355----------------------------------------------------------------------
3563. Retrace worker
357----------------------------------------------------------------------
358
359The worker (abrt-retrace-worker binary) gets a
360/var/spool/abrt-retrace/<id> directory as an input. The worker reads
361the operating system name and version, the coredump, and the list of
362packages needed for retracing (a package containing the binary which
363crashed, and packages with the libraries that are used by the binary).
364
365The worker prepares a new "chroot" subdirectory with the packages,
366their debuginfo, and gdb installed. In other words, a new directory
367/var/spool/abrt-retrace/<id>/chroot is created and the packages are
368unpacked or installed into this directory, so for example the gdb ends
369up as /var/.../<id>/chroot/usr/bin/gdb.
370
371After the "chroot" subdirectory is prepared, the worker moves the
372coredump there and changes root (using the chroot system function) of
373a child script there. The child script runs the gdb on the coredump,
374and the gdb sees the corresponding crashy binary, all the debuginfo
375and all the proper versions of libraries on right places.
376
377When the gdb run is finished, the worker copies the resulting
378backtrace to the /var/spool/abrt-retrace/<id>/backtrace file and
379stores a log from the whole chroot process to the retrace-log file in
380the same directory. Then it removes the chroot directory.
381
382The GDB installed into the chroot must be able to:
383- run on the server (same architecture, or we can use QEMU user space
384  emulation, see
385  http://wiki.qemu.org/download/qemu-doc.html#QEMU-User-space-emulator)
386- process the coredump (possibly from another architecture): that
387  means we need a special GDB for every supported architecture
388- be able to handle coredumps created in an environment with prelink
389  enabled (should not be a problem, see
390  http://sourceware.org/ml/gdb/2009-05/msg00175.html)
391- use libc, zlib, readline, ncurses, expat and Python packages, while
392  the version numbers required by the coredump might be different from
393  what is required by the GDB
394
395The gdb might fail to run with certain combinations of package
396dependencies. Nevertheless, we need to provide the libc/Python/*
397package versions which are required by the coredump. If we would not
398do that, the backtraces generated from such an environment would be of
399lower quality. Consider a coredump which was caused by a crash of
400Python application on a client, and which we analyze on the retrace
401server with completely different version of Python because the
402client's Python version is not compatible with our GDB.
403
404We can solve the issue by installing the GDB package dependencies
405first, move their binaries to some safe place (/lib/gdb in the
406chroot), and create the /etc/ld.so.preload file pointing to that
407place, or set LD_LIBRARY_PATH. Then we can unpack libc binaries and
408other packages and their versions as required by the coredump to the
409common paths, and the GDB would run happily, using the libraries from
410/lib/gdb and not those from /lib and /usr/lib. This approach can use
411standard GDB builds with various target architectures: gdb, gdb-i386,
412gdb-ppc64, gdb-s390 (nonexistent in Fedora/EPEL at the time of writing
413this).
414
415The GDB and its dependencies are stored separately from the packages
416used as data for coredump processing. A single combination of GDB and
417its dependencies can be used across all supported OS to generate
418backtraces.
419
420The retrace worker must be able to prepare a chroot-ready environment
421for certain supported operating system, which is different from the
422retrace server's operating system. It needs to fake the /dev directory
423and create some basic files in /etc like passwd and hosts. We can use
424the "mock" library (https://fedorahosted.org/mock/) to do that, as it
425does almost what we need (but not exactly as it has a strong focus on
426preparing the environment for rpmbuild and running it), or we can come
427up with our own solution, while stealing some code from the mock
428library. The /usr/bin/mock executable is entirely unuseful for the
429retrace server, but the underlying Python library can be used. So if
430would like to use mock, an ABRT-specific interface to the mock library
431must be written or the retrace worker must be written in Python and
432use the mock Python library directly.
433
434We should save time and disk space by extracting only binaries and
435dynamic libraries from the packages for the coredump analysis, and
436omit all other files. We can save even more time and disk space by
437extracting only the libraries and binaries really referenced by the
438coredump (eu-unstrip tells us the list). Packages should not be
439_installed_ to the chroot, they should be _extracted_ only, because we
440use them as a data source, and we never run them.
441
442Another idea to be considered is that we can avoid the package
443extraction if we can teach GDB to read the dynamic libraries, the
444binary, and the debuginfo directly from the RPM packages. We would
445provide a backend to GDB which can do that, and provide tiny front-end
446program which tells the backend which RPMs it should use and then run
447the GDB command loop. The result would be a GDB wrapper/extension we
448need to maintain, but it should end up pretty small. We would use
449Python to write our extension, as we do not want to (inelegantly)
450maintain a patch against GDB core. We need to ask GDB people if the
451Python interface is capable of handling this idea, and how much work
452it would be to implement it.
453
454----------------------------------------------------------------------
4554. Package repository
456----------------------------------------------------------------------
457
458We should support every Fedora release with all packages that ever
459made it to the updates and updates-testing repositories. In order to
460provide all that packages, a local repository is maintained for every
461supported operating system. The debuginfos might be provided by a
462debuginfo server in future (it will save the server disk space). We
463should support the usage of local debuginfo first, and add the
464debuginfofs support later.
465
466A repository with Fedora packages must be maintained locally on the
467server to provide good performance and to provide data from older
468packages already removed from the official repositories. We need a
469package downloader, which scans Fedora servers for new packages, and
470downloads them so they are immediately available.
471
472Older versions of packages are regularly deleted from the updates and
473updates-testing repositories. We must support older versions of
474packages, because that is one of two major pain-points that the
475retrace server is supposed to solve (the other one is the slowness of
476debuginfo download and debuginfo disk space requirements).
477
478A script abrt-reposync must download packages from Fedora
479repositories, but it must not delete older versions of the
480packages. The retrace server administrator is supposed to call this
481script using cron every ~6 hours. This expectation must be documented
482in the abrt-reposync manual page. The script can use use wget, rsync,
483or reposync tool to get the packages. The remote yum source
484repositories must be configured from a configuration file or files
485(/etc/yum.repos.d might be used).
486
487When the abrt-reposync is used to sync with the Rawhide repository,
488unneeded packages (where a newer version exists) must be removed after
489residing one week with the newer package in the same repository.
490
491All the unneeded content from the newly downloaded packages should be
492removed to save disk space and speed up chroot creation. We need just
493the binaries and dynamic libraries, and that is a tiny part of package
494contents.
495
496The packages should be downloaded to a local repository in
497/var/cache/abrt-repo/{fedora12,fedora12-debuginfo,...}.
498
499----------------------------------------------------------------------
5005. Traffic and load estimation
501----------------------------------------------------------------------
502
5032500 bugs are reported from ABRT every month. Approximately 7.3% from
504that are Python exceptions, which don't need a retrace server. That
505means that 2315 bugs need a retrace server. That is 77 bugs per day,
506or 3.3 bugs every hour on average. Occasional spikes might be much
507higher (imagine a user that decided to report all his 8 crashes from
508last month).
509
510We should probably not try to predict if the monthly bug count goes up
511or down. New, untested versions of software are added to Fedora, but
512on the other side most software matures and becomes less crashy.  So
513let's assume that the bug count stays approximately the same.
514
515Test crashes (see that we should probably use `xz -2` to compress
516coredumps):
517- firefox with 7 tabs with random pages opened
518   - coredump size: 172 MB
519   - xz:
520     - compression level 6 - default:
521       - compression time on my machine: 32.5 sec
522       - compressed coredump: 5.4 MB
523       - decompression time: 2.7 sec
524     - compression level 3:
525       - compression time on my machine: 23.4 sec
526       - compressed coredump: 5.6 MB
527       - decompression time: 1.6 sec
528     - compression level 2:
529       - compression time on my machine: 6.8 sec
530       - compressed coredump: 6.1 MB
531       - decompression time: 3.7 sec
532     - compression level 1:
533       - compression time on my machine: 5.1 sec
534       - compressed coredump: 6.4 MB
535       - decompression time: 2.4 sec
536   - gzip:
537     - compression level 9 - highest:
538       - compression time on my machine: 7.6 sec
539       - compressed coredump: 7.9 MB
540       - decompression time: 1.5 sec
541     - compression level 6 - default:
542       - compression time on my machine: 2.6 sec
543       - compressed coredump: 8 MB
544       - decompression time: 2.3 sec
545     - compression level 3:
546       - compression time on my machine: 1.7 sec
547       - compressed coredump: 8.9 MB
548       - decompression time: 1.7 sec
549- thunderbird with thousands of emails opened
550   - coredump size: 218 MB
551   - xz:
552     - compression level 6 - default:
553       - compression time on my machine: 60 sec
554       - compressed coredump size: 12 MB
555       - decompression time: 3.6 sec
556     - compression level 3:
557       - compression time on my machine: 42 sec
558       - compressed coredump size: 13 MB
559       - decompression time: 3.0 sec
560     - compression level 2:
561       - compression time on my machine: 10 sec
562       - compressed coredump size: 14 MB
563       - decompression time: 3.0 sec
564     - compression level 1:
565       - compression time on my machine: 8.3 sec
566       - compressed coredump size: 15 MB
567       - decompression time: 3.2 sec
568   - gzip
569     - compression level 9 - highest:
570       - compression time on my machine: 14.9 sec
571       - compressed coredump size: 18 MB
572       - decompression time: 2.4 sec
573     - compression level 6 - default:
574       - compression time on my machine: 4.4 sec
575       - compressed coredump size: 18 MB
576       - decompression time: 2.2 sec
577     - compression level 3:
578       - compression time on my machine: 2.7 sec
579       - compressed coredump size: 20 MB
580       - decompression time: 3 sec
581- evince with 2 pdfs (1 and 42 pages) opened:
582   - coredump size: 73 MB
583   - xz:
584     - compression level 2:
585       - compression time on my machine: 2.9 sec
586       - compressed coredump size: 3.6 MB
587       - decompression time: 0.7 sec
588     - compression level 1:
589       - compression time on my machine: 2.5 sec
590       - compressed coredump size: 3.9 MB
591       - decompression time: 0.7 sec
592- OpenOffice.org Impress with 25 pages presentation:
593   - coredump size: 116 MB
594   - xz:
595     - compression level 2:
596       - compression time on my machine: 7.1 sec
597       - compressed coredump size: 12 MB
598       - decompression time: 2.3 sec
599
600So let's imagine there are some users that want to report their
601crashes approximately at the same time. Here is what the retrace
602server must handle:
603- 2 OpenOffice crashes
604- 2 evince crashes
605- 2 thunderbird crashes
606- 2 firefox crashes
607
608We will use the xz archiver with the compression level 2 on the ABRT's
609side to compress the coredumps. So the users spend 53.6 seconds in
610total packaging the coredumps.
611
612The packaged coredumps have 71.4 MB, and the retrace server must
613receive that data.
614
615The server unpacks the coredumps (perhaps in the same time), so they
616need 1158 MB of disk space on the server. The decompression will take
61719.4 seconds.
618
619Several hundred megabytes will be needed to install all the required
620binaries and debuginfos for every chroot (8 chroots 1 GB each = 8 GB,
621but this seems like an extreme, maximal case). Some space will be
622saved by using a debuginfofs.
623
624Note that most applications are not as heavyweight as OpenOffice and
625Firefox.
626
627----------------------------------------------------------------------
6286. Security
629----------------------------------------------------------------------
630
631The retrace server communicates with two other entities: it accepts
632coredumps form users, and it downloads debuginfos and packages from
633distribution repositories.
634
635General security from GDB flaws and malicious data is provided by
636chroot. The GDB accesses the debuginfos, packages, and the coredump
637from within the chroot, unable to access the retrace server's
638environment. We should consider setting a disk quota to every chroot
639directory, and limit the GDB access to resources using cgroups.
640
641SELinux policy should be written for both the retrace server's HTTP
642interface, and for the retrace worker.
643
644----------------------------------------------------------------------
6456.1 Clients
646----------------------------------------------------------------------
647
648The clients, which are using the retrace server and sending coredumps
649to it, must fully trust the retrace server administrator.  The server
650administrator must not try to get sensitive data from client
651coredumps.  That seems to be a major bottleneck of the retrace server
652idea.  However, users of an operating system already trust the OS
653provider in various important matters. So when the retrace server is
654operated by the operating system provider, that might be acceptable by
655users.
656
657We cannot avoid sending clients' coredumps to the retrace server, if
658we want to generate quality backtraces containing the values of
659variables. Minidumps are not acceptable solution, as they lower the
660quality of the resulting backtraces, while not improving user
661security.
662
663Can the retrace server trust clients? We must know what can a
664malicious client achieve by crafting a nonstandard coredump, which
665will be processed by server's GDB.  We should ask GDB experts about
666this.
667
668Another question is whether we can allow users providing some packages
669and debuginfo together with a coredump. That might be useful for
670users, who run the operating system only with some minor
671modifications, and they still want to use the retrace server. So they
672send a coredump together with a few nonstandard packages. The retrace
673server uses the nonstandard packages together with the OS packages to
674generate the backtrace. Is it safe? We must know what can a malicious
675client achieve by crafting a special binary and debuginfo, which will
676be processed by server's GDB.
677
678----------------------------------------------------------------------
6796.2 Packages and debuginfo
680----------------------------------------------------------------------
681
682We can safely download packages and debuginfo from the distribution,
683as the packages are signed by the distribution, and the package origin
684can be verified.
685
686When the debuginfo server is done, the retrace server can safely use
687it, as the data will also be signed.
688
689----------------------------------------------------------------------
6907 Future work
691----------------------------------------------------------------------
692
6931. Coredump stripping. Jan Kratochvil: With my test of OpenOffice.org
694presentation kernel core file has 181MB, xz -2 of it has 65MB.
695According to `set target debug 1' GDB reads only 131406 bytes of it
696(incl. the NOTE segment).
697
6982. User management for the HTTP interface. We need multiple
699authentication sources (x509 for RHEL).
700
7013. Make architecture, release, packages files, which must be included
702in the package when creating a task, optional. Allow uploading a
703coredump without involving tar: just coredump, coredump.gz, or
704coredump.xz.
705
7064. Handle non-standard packages (provided by user)