Online social networks are gathering information about their users that those people never intended to disclose, and government regulation may be the only way to stop the practice, a researcher said Tuesday.
People deliberately disclose a great deal of their personal data to social networks such as Twitter, Facebook and LinkedIn, but the networks can use that information, and data about users' online behavior, to infer even more, allowing them to build extensive user profiles, said Christian Zimmermann, a researcher at the University of Freiburg in Germany, at the Amsterdam Privacy Conference.
For users to regain control over what the networks know about them would require the networks to be more transparent about their methods. That's unlikely to happen without regulation or an economic incentive, because transparent processes would be hard to implement and knowing less about their users is not in the networks' interest.
Existing techniques already allow the social networks to determine the purchasing power, ethnicity or political affiliation of users who did not intend to disclose such information, Zimmermann said. Making such inferences without informing the users constitutes a severe threat to privacy, and allows unprecedented user profiling, he said.
"Knowing someone's political affiliation might not be so important here, but it might be very important in countries like Syria," said Zimmermann, who added that kind of inferred data might have a big impact on someone's life there.
Indications of a user's purchasing power can also be useful for companies, he said. "Recently it was discovered that a hotel booking site showed more expensive hotels to Mac users that visited the site," Zimmermann said, referring to online booking site Orbitz, which showed site visitors rooms in more expensive hotels depending on the computer they used.
Similar techniques could benefit social networks such as Facebook, Twitter and LinkedIn, which offer a mostly free service to users, and generate the lion's share of their revenue by selling advertising space. The highest price is paid for targeted advertising that matches the user's interest and purchasing power, and inferred data could help a lot in targeting ads, Zimmermann said.
"Revenues from data-centric service providers depends directly on the data they gather from users," he said.
The problem is that social network users have no idea what information is pieced together about them, while from a privacy perspective, it should really be the user who is in control of the data, not the social network, Zimmermann said. "Of course there is a privacy risk, because there is derived data that you really did not intend to disclose. From a user prospective it is a black box."
People can't know what data is inferred about them because they don't know what rules are used to build the extensive user profiles, he said. Moreover, the rule sets used by social networks evolve constantly. As users publicize new information and the provider gathers new data, new patterns emerge causing old patterns to change and it is impossible for a user to predict these changes, he said.
One way to prevent inferences being made from data is not to disclose it in the first place, but that is very hard for users to do when they don't know how an online service is combining information about them. Besides that, others might be disclosing the very information that they are trying to keep secret.
Because of these challenges it is obvious that profile building based on user behavior rather than the information that is disclosed by the user cannot be prevented, but it can be limited, said Zimmermann.
Users could for instance use privacy enhancing technologies like The Onion Router (TOR), or use pseudonyms to hide their online identities. "They can be really helpful but cannot solve the problem," Zimmermann said. Using those technologies doesn't stop social networks from gathering inferred data, he said.
Another possibility is to enhance transparency. Social networks could disclose what information about users is tracked, said Zimmermann. The main problem with transparency is that social networks lack the incentive to make inferred data available users, he said.
A possible solution is to take regulatory action and force monitoring on the provider, he said. "But that is not easy to do," he said. If the European Union's 27 member states were unable to agree a single regulation, then social networks would have to figure out a way to apply a patchwork of national regulations to users' data.
Making the process transparent is not sufficient to stop the services from building profiles based on inferred data, though, he said. A combination of preventive data disclosure and transparency can tame the inference problem, said Zimmermann, but still cannot prevent inferences completely. It will, though, give users the means with which to limit the threat in the long run.