#6655 json serialization performance issues
Closed None Opened 7 years ago by cheimes.

FreeIPA's json serialization is slow. In my vault performance analysis #6651, json serialization and its helper function were 3 or 4 functions in the top 15. The other slow functions were mostly I/O and network bound calls (SSL to Dogtag, GSSAPI, LDAP).

sort_key and ident

ipaserver.rpcserver.jsonserver.marshall() uses json.dumps() with sort_keys=True and indent=4. Both sort_keys and indent are problematic. Either one of the arguments forces Python's json parser to fall back to a Python-only implementation of the iter encoder, https://hg.python.org/cpython/file/tip/Lib/json/encoder.py#l246 .

I noticed the problem while I was analyzing WSGI profiler data with snakeviz. The profile dumps for 100 requests contain over 2.5 mio calls to isinstance() and almost 2.5 mio calls to _iterencode_dict().

  2790767    0.811    0.000    0.926    0.000 {isinstance}
2495797/519934    0.776    0.000    1.205    0.000 /usr/lib64/python2.7/json/encoder.py:341(_iterencode_dict)

json_encode_binary

The function json_encode_binary consumes a lot of time and seems to call isinstance() pretty often, too. The encoder function can be implemented more efficiently as custom JSONEncoder with some additional tricks.

138038/299    0.274    0.000    0.523    0.002 /usr/lib/python2.7/site-packages/ipalib/rpc.py:276(json_encode_binary)

ipalib.util.json_serialize is yet another JSON serializer that pops up in my profiling data.

https://github.com/freeipa/freeipa/pull/459 optimizes the performance of RPC client and server serialization and deserialization.

We might want to consider a faster JSON encoder/decoder, too. ujson is fast but lacks obj_hook feature.
https://github.com/python-rapidjson/python-rapidjson is powerful and fast but has no Python 2 support.

master:

  • 8159c28 Faster JSON encoder/decoder
  • b12b1e4 Convert list to tuples
  • 3cac037 Pretty print JSON in debug mode (debug level >= 2)
  • 2ff07b9 Fix test, nested lists are no longer converted to nested tuples
  • 1d7fcfe Explain more performance tricks in doc string

Please leave it open until I have investigated ipalib.util.json_serialize.

json_serialize definitely needs some love. ipalib.parameters.Param.__init__, too.

127.0.0.1 - - [15/Feb/2017 21:26:43] "POST /ipa/session/json HTTP/1.1" 200 -
ipa: INFO: [jsonserver_session] admin@IPA.EXAMPLE: json_metadata(None, None, command=u'all', version=u'2.218'): SUCCESS
--------------------------------------------------------------------------------
PATH: '/session/json'
         2340515 function calls (2131924 primitive calls) in 1.142 seconds

   Ordered by: internal time, call count
   List reduced from 432 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3556    0.201    0.000    0.452    0.000 /home/heimes/redhat/freeipa/ipalib/parameters.py:441(__init__)
187490/446    0.148    0.000    1.004    0.002 /home/heimes/redhat/freeipa/ipalib/util.py:58(json_serialize)
   718435    0.135    0.000    0.135    0.000 {isinstance}
     3999    0.085    0.000    0.256    0.000 /home/heimes/redhat/freeipa/ipalib/parameters.py:954(__json__)
   173558    0.076    0.000    0.076    0.000 {hasattr}
143774/143772    0.053    0.000    0.053    0.000 /home/heimes/redhat/freeipa/ipalib/base.py:123(__setattr__)
   104200    0.037    0.000    0.072    0.000 {setattr}
        1    0.032    0.032    0.032    0.032 {built-in method sasl_interactive_bind_s}
   4450/1    0.031    0.000    0.049    0.049 /home/heimes/redhat/freeipa/ipalib/rpc.py:388(_enc_dict)
   234842    0.026    0.000    0.027    0.000 {getattr}
 4445/446    0.025    0.000    0.460    0.001 /home/heimes/redhat/freeipa/ipalib/util.py:62(<dictcomp>)
   234451    0.019    0.000    0.019    0.000 {method 'get' of 'dict' objects}
        1    0.018    0.018    0.018    0.018 /usr/lib64/python2.7/json/encoder.py:212(iterencode)
13038/892    0.015    0.000    0.047    0.000 /home/heimes/redhat/freeipa/ipalib/rpc.py:380(_enc_list)
     3556    0.012    0.000    0.012    0.000 /home/heimes/redhat/freeipa/ipalib/parameters.py:261(parse_param_spec)
    24961    0.012    0.000    0.012    0.000 {_codecs.utf_8_decode}
    24961    0.011    0.000    0.030    0.000 {method 'decode' of 'str' objects}
   107753    0.010    0.000    0.010    0.000 /home/heimes/redhat/freeipa/ipalib/parameters.py:506(<genexpr>)
     7176    0.008    0.000    0.012    0.000 /home/heimes/redhat/freeipa/ipalib/text.py:248(as_unicode)
     1973    0.007    0.000    0.273    0.000 /home/heimes/redhat/freeipa/ipalib/parameters.py:725(clone_retype)
    24961    0.007    0.000    0.019    0.000 /usr/lib64/python2.7/encodings/utf_8.py:15(decode)
     7388    0.006    0.000    0.006    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
9958/9062    0.006    0.000    0.009    0.000 /home/heimes/redhat/freeipa/ipalib/util.py:846(__get__)
    77523    0.006    0.000    0.006    0.000 {method 'pop' of 'dict' objects}
     1975    0.006    0.000    0.273    0.000 /home/heimes/redhat/freeipa/ipalib/parameters.py:1433(__init__)
     7565    0.006    0.000    0.006    0.000 /usr/lib64/python2.7/re.py:230(_compile)
        1    0.006    0.006    0.006    0.006 {method 'encode' of 'str' objects}
     3364    0.005    0.000    0.010    0.000 /home/heimes/redhat/freeipa/ipalib/plugable.py:288(__enumerate)
     7386    0.005    0.000    0.017    0.000 /usr/lib64/python2.7/re.py:138(match)
     7475    0.005    0.000    0.018    0.000 /home/heimes/redhat/freeipa/ipalib/text.py:264(__json__)

For Vault use case, wouldn't it be better to measure it with Vault commands? Using json_metadata command is good for measurement of JSON serialization and parameter init. Mostly because json_metadata produce really big list - pretty printed, without compression it has about 0.5MB. But in Vault use case json serialization and param init won't be probably a problem because response is most-likely small(depends on secret size) and the command doesn't iterate through all commands in IPA.

The issue is about JSON performance in general. Vault is merely the main driver to fix some critical paths ASAP. The performance issue ipalib.util.json_serialize should be fixed during the development cycle of 4.5, too. It's a low hanging fruit that will dramatically improve performance for some operations with less than an hour of coding.

How is json metadata used? Does the client call when it bootstraps the API or is it just used in the API browser?

Param.__init___ and gzip compression are unrelated issues. For gzip compression we can simply use Apache mod_gzip to handle compression on the fly. It's just a matter of adding SetOutputFilter DEFLATE to Apache ipa.conf.

Petr, I've created #6681 for compression.

JFTR, json_metadata command is used in Web UI right after authentication.

There are two calls:

{"method":"json_metadata","params":[[],{"object":"all","version":"2.217"}]}

And:

{"method":"json_metadata","params":[[],{"command":"all","version":"2.217"}]}

First takes 453ms, second 2.10s on my testing vm. It might be viewed as long time, but it happens only once per one browser tab. It is significantly less time then Web UI of other products. Therefore I don't see a need to optimize it now. We have a ticket to cache the metadata though.

I added issue #6693 for json_serialize().

Metadata Update from @cheimes:
- Issue assigned to someone
- Issue set to the milestone: FreeIPA 4.5

7 years ago

master:

  • 8295848 rpc: fix crash in verbose mode

Metadata Update from @mbasti:
- Issue close_status updated to: None (was: Fixed)

7 years ago

Login to comment on this ticket.

Metadata