#5381 datagreepper is a giant privacy threat
Closed: Will Not/Can Not fix 6 years ago Opened 7 years ago by cwickert.

= bug description =
Datagrepper exposes tons of private data such as email addresses and who did what when. just look at https://apps.fedoraproject.org/datagrepper/raw?user=cwickert

It seems the developers did not take privacy into account at all.

= fix recommendation =
We need authentication against FAS. And even that might not be enough. Even though I'm part of the Fedora community, I don't want all of my data exposed to everybody.

The same goes for fedmsg btw.


Since the datagrepper came into existence the fedmsg messages are no longer remain private. We have datagrepper since 2013 I guess. Anyone can query to datagrepper since then. I think the email addresses are exposed by the recent addition of hooks like bugzilla and mailman only.

I am agree here at least that datagrepper results should be given to FAS account users only.

(fixed the title, as I am quite sure that's threat and not thread)

So, first of all I am not sure this is the right place to have this discussion, this may well be a matter for the council to decide on. I know there was some work on redoing our privacy policy, but I don't know the status on it, and infrastructure can only use the policy that we have now, not a draft one.

That said, if you look at: https://fedoraproject.org/wiki/Legal:PrivacyPolicy

You can see that we have some information that "In keeping with the open nature and spirit of Fedora, some personal information attached to Fedora accounts is made public by default." which includes email address. There's a further note that if you set the private flag on your account these things will not be public, with the exception of email address "which may still be visible in some Fedora services such as Bugzilla." (which should definitely be ameneded to include datagrepper there).

Next, note that the "who did what when" information here is all just handy in datagrepper, the actual information is in all the services that you interact with. Removing it from datagrepper would just mean someone could get the same information from multiple other queries, many of them requiring no authentication. For example, anyone (not even logged in) can query bugzilla and see your actions, or search mailman, etc.

Putting a auth in front of datagrepper is something we could do, but it would break any scripts/uses people are currently using that don't expect it, and if it was just to anyone with a fas account it would be a pretty low barrier for someone to just sign up for an account. If it was not, then what would it be restricted to? What group of people should have the information?

Perhaps you could expand on why you don't want your open source contributions visible? Would you also want particular services to not show your contibutions?

Perhaps we could expand our fas privacy setting to hashing all folks who request privacy in fedmsg/datagrepper? but it would be pretty easy to figure out who was who, and also note that your fas account is not set private currently. ;)

So, it's been 3 months here.

I'm going to close this and ask that you bring it up to the council if you wish us to adopt a new policy here. We are simply aggregating existing public info in one place.

Kevin, I think you should not have closed this ticket. It is not about the priva

Sorry, I submitted my comment to quick. I hit the "Update Issue" button hoping it would allow me to edit details such as status of the ticket, and instead it posted my comment. Not very intuitive, the button should be labeled "Submit" instead.

Back to the initial topic: Kevin, I think you should not have closed this ticket since it is definitely not "fixed". The purpose of this ticket is to limit access to the aggregated information. It is not about the privacy policy and working on one is a huge effort that likely will not be done in time for the fedorahosted -> pagure migration. In fact, we are facing chicken and egg problem: We need to know what data is collected and how it is made available, before we can work on a transparent privacy policy. Otherwise we end up with something vague like "You personal data such as [we don't know] might be collected and made available to third parties without authentication.

So please be so kind as to reopen the ticket for the technical side of things. Once the technical things are clear, the council can deal with policy elsewhere.

@cwickert changed the status to Open

7 years ago

We can reopen, sure... (well, looks like you already did).

I am personally against requiring authentication to access datagrepper for at least the following reasons:

  • It would break every single thing that has grown up using datagrepper. This not only means a bunch of things in infrastructure, but a bunch of things users are using. Things like fedmsg-tail, fedmsg-notify, and other end user items used by I don't know how many people.
  • It would mean simple 1 line calls would have to grow means of storing and using auth, potentially meaning users would put their fas credentials in scripts, making them more vulnerable.
  • I don't agree that simply aggregating open data means it needs to be more restricted than the open sources it comes from. Why does this need auth protection, but pkgdb, cgit, koji, etc do not?

Aside from requiring auth for datagrepper are their any other proposed technical solutions that would help your concerns?

Hey @cwickert It's been almost another year here.

As far as I can see, the only technical suggestion here is to put all fedmsg data behind authentication. I disagree that this is needed or desireable. So, IMHO, I think you should ask the council if they would like to look into this and make some statement about what level of privacy is required for fedmsg data. They can of course override infrastructure and ask us to implement auth for fedmsg data if they wish.

I do ask kindly if you could include me in any council ticket or thread so I can provide input as well.

Thanks and sorry we haven't agreed here.

Metadata Update from @kevin:
- Issue close_status updated to: Will Not/Can Not fix
- Issue status updated to: Closed (was: Open)

6 years ago

Login to comment on this ticket.

Metadata