#1651 anonymous crash data submission
Closed: Fixed None Opened 14 years ago by walters.

To support various applications such as crash reporting, it'd be nice if Fedora infrastructure provided a scalable key/value data storage system like Amazon's S3 (http://aws.amazon.com/s3/).

Once we have some explicit data storage API, we can then write a server which accepts say crash submissions via HTTP POST. Later we can write a tool which analyzes the crashes, or reuse socorro.


I created something similar in https://fedorahosted.org/fedora-data/ in that it uses trac xmlrpc to upload and store information, see the source for examples for anaconda uploading.

Ah, interesting. What's it backed by? Unix filesystem?

Would something larger scope such as anonymous (not tied to FAS account) crash reports, and possibly something like https://www.redhat.com/archives/fedora-devel-list/2009-August/msg01255.html

be doable with a server just accepting data like your XML-RPC thing? In terms of network bandwidth, storage availability etc. Let me try to make up some napkin-math numbers. Say there are a million Fedora users, and 75% of them opt to send feedback. Then let's say the gzipped feedback is 16k (tested with fpaste --sysinfo + gzip), submitted monthly.

That's 14G/month.

The crash data is likely to be a lot more spiky, but if developers/QA do well should be significantly smaller.

Ok let's separate these two things. I'm most interested in a server system for collecting anonymous crash data.

My proposal:

  • Write a simple server which takes a POST of crash data, host it on crashes.fedoraproject.org
  • Backend storage - (need advice here), we can just use a POSIX filesystem API for now I guess, but ideally Fedora infrastructure has a scalable key/value store.
  • Write an ABRT plugin which submits data to this system, add this plugin to the default install
  • Remove abrt-plugin-bugzilla from the default install

Concerns:
Data usage: We can have a system which culls the data
Denial of service: Not sure - FI suggests application has to be written to look at IPs?

Replying to [comment:3 walters]:

Concerns:
* Data usage: We can have a system which culls the data

This includes things like discarding data older than a year and limiting the total amount of data to some quota. It keeps our disk usage from growing more quickly than we can get new resources for it.

  • Denial of service: Not sure - FI suggests application has to be written to look at IPs?

wwoods suggested rate limiting by IP. Since we will likely have proxy servers in front of this, looking at the apache logs won't work. Doing this in the app itself is doable since the app has access to the cgi environment which should have the original ip address.

Also, we need to know how the data is to be stored. If it's in the db, do we have the storage to keep all the crash reports? Will the db server be able to handle the relevant queries? If it's in the filesystem, do we need to set up a network shared filesystem that can handle access to the reports?

Addtional note: So far we've been able to avoid having to create a shared filesystem for our apps.

Ok, I can see how this would be valuable. Here's what is required:

1) Someone who is willing to do the work to get it ready
2) an infrastructure sponsor
3) an RFR. Go ahead and fill one of one of these out:http://fedoraproject.org/wiki/Infrastructure/RFR

Then send a note to the infrastructure list for further discussion.

Not much movement here. ;)

Is this still needed/wanted?

Going to close this now. Feel free to re-open if there's further action to take here.

Login to comment on this ticket.

Metadata