Hadooponomics: Impacting Change with Big Data – What the Panama Papers Can Teach Us (Podcast Transcript)

James Haight: You’re listening to the Hadooponomics podcast, and, as always, I’m your host, James Haight. Pleasure to be back here with you guys this week. We have a real treat of an episode for you. And I know I always say that, but this is a subject and a topic that’s near and dear to my heart, and really happy to bring this story to you guys. Our guest today is Mar Cabra, and she’s the head of the Data and Research Unit at the International Consortium for Investigative Journalists. And she’s done a lot of really amazing things and worked on some pretty cool projects, but principally, what we’re talking about in this episode, is actually the Panama Papers. She had a major hand in breaking the news stories and exposing all this to the world. For those of you who aren’t familiar, the Panama Papers, one of the biggest investigative journalism leaks the world’s ever seen. And it sort of exposed the underworld and tax havens of powerful leaders, politicians, celebrities, and the ultra wealthy, and really had some amazing impacts in terms of toppling world leaders and exposing huge levels of corruption that previously had never been brought to light.

So with this episode, you’re gonna notice that pretty quickly it’s a bit of a change of pace. This is not the most technical episode we’ve ever done. In fact, we keep it pretty surface level, and really talk about the impact of data as a whole, rather than digging into the weeds. So I’d encourage any of you who maybe are a little unfamiliar with listening to an episode like this to think about this in terms of how we can use data to shape and influence the world. And think about this takeaway that data, by itself, just finding the answer is actually meaningless if you can’t apply it, if you can’t effect change, if you can’t get the right people on board to influence action in the future. So whether you’re using data and crunching numbers to optimize your profit model and increase customer conversions, or if you’re using it like Mar, to help topple world leaders, I think a lot of the principles are the same. So think about it in this way: we’re wrapping up a lot of the concepts and topics that we’ve talked about on the importance of communicating data and what technology can do as an enabling force going forward. And wrapping it up in a pretty cool and amazing story that I think we’re really lucky to have Mar on the show to talk about.

So with that, I’ll make the obligatory plug, bluehillresearch.com/hadooponomics. It’ll have the show notes, it’ll have the transcript, it’ll have links to this episode, and, of course, how to get in touch with Mar. And it’s a really awesome resource if you want to sort of go back and dig a little bit deeper and find out more about this subject. So, with that, I’m gonna step out of the way, and let’s go straight to the interview with Mar.

Okay, everyone, I am here with Mar Cabra. She is the head of the data research team for the International Consortium of Investigative Journalists. Mar, welcome to the show.

Mar Cabra: Thanks so for much for inviting me.

James: Mar, we have an awesome episode laid out in front of us. I’m really excited. You are one of the most interesting people I think we’re ever gonna have a chance to talk to, which may sound like hyperbole, because I say it for a lot of our guests and episodes. But rather than me sort of explaining you to the audience, just give us a little bit of background about who you are, what you do, and we can take it from there.

Mar: Well I think that we’re gonna talk about a very fascinating investigation called the Panama Papers, which has been the largest leak in journalism history so far. And we worked on it at the International Consortium of Investigative Journalists, which is the media organization I work for. And we are a not-for-profit based out of Washington, DC, but that connect reporters from all over the globe. For the Panama Papers we gathered a big team of 400 reporters in 80 countries to work together, in secret, over a year. And myself, well, I have to say, I’m a journalist, although technology has changed my life. I used to be a TV journalist and I discovered data while I was studying at Columbia University in New York. And I thought that if journalism is about making sense of the world around us, journalism then needs to make sense of the electronic world that surrounds us today. And that’s how I got into data, and now I lead this team inside the ICIJ that is a multidisciplinary team of reporters and coders that try to help ICIJ do better cross-border investigations.

James: Absolutely, and so one of the things that we do on this show is we sometimes bring in a more technical audience to talk about sort of the speeds and feeds, if you will, of what’s powering the next generation of data trends. And then also we have an opportunity here, though, to talk about what does that mean, right? If we look at technology as this enabling force, if we look at all the advances that we’ve talked about, and led up to, and are looking forward to in the world of data, what does that actually mean for us in our daily lives? And how’s it gonna affect it going forward? And that’s why I’m really glad to have you on the show, because, I mean, we’re gonna talk about the Panama Papers, and I can’t imagine a better example of how technology and data’s directly impacting some pretty major change on the worldwide stage.

Mar: Right, I think that we’re living a new ecosystem in the world, and in journalism, especially, where electronic leaks are the new normal. And the Panama Papers is the best example of how getting a leak to a journalist, or leaking private information of a company is very easy today. And what we do in journalism, and what we did on the Panama Papers was to tackle this issue through global collaboration and through the use of data journalism and data analysis. And I think that that’s one of the successes of the Panama Papers, and this investigation, is that we try to use technology to drive the analysis of this big leak.

James: So before we dive in, let’s stop at two points here. Number one, give us that 15 to 20 second pitch of what the Panama Papers are, for anyone in the audience who might not be so familiar. And then, two, what’s the process? You didn’t hack into anything, someone gave you these files. So let’s take this two part step here, and first let’s just start off, what are the Panama Papers, and why should we care?

Mar: The Panama Papers is the largest leak in journalism history. It’s 2.6 terabytes of information, 11.5 million files, that allowed reporters to dive, for the first time, inside the offshore machine, inside this offshore economy that is used by the powerful as a parallel economy. And it happens in secret, because you create a company in a tax haven, and nobody knows what’s being done with that company. And those companies are being used to finance terrorism, to fund the war in Syria, for example, or to do illegal things, in many cases. In some others, some legal issues. So we did an investigation for a year, 400 reporters in 80 countries to expose this system that effects us all. And what we did was to use technology to analyze these leaks, and I can tell you a bit more about that part. Before I do that, I forgot your second question [laughs], so remind me of the second question!

James: [laughs] Sure, I’m just curious about the process, right? You guys didn’t go try and find some shady law offices down in Panama and hack their systems. Someone, did they show up with a USB file at your desk, how does that work? [laughs]

Mar: Right, okay, yes, so actually, you may wonder, after my explanation, how did you get these 2.6 terabytes of information? And leaks, in the past, in journalism, depended on people making photocopies and giving you a stack of papers. Or were dependent on you meeting in a parking lot, right, like in the case of Watergate and Deep Throat. And today, you don’t even need to meet with a source. Some reporters from Süddeutsche Zeitung in Germany, a newspaper based out of Munich, received an email. And that email said, hey, are you interested in data? And the response of my colleague, Bastian Obermayer, from that newspaper, was, of course! [laughs] And with that first email interaction is that everything started. I think that the interesting thing about this is that my colleague Bastian and his colleague Frederik in Süddeutsche Zeitung didn’t stop there. They didn’t just get the 2.6 terabytes of data and say, hey, I’m gonna do this great story for my newspaper in Germany. They immediately saw the global potential of this leak, and there were connections to more than 200 countries in the world. So they came to the ICIJ so that we could form a global team and investigate this together. And I think that that’s one of the beautiful things about the Panama Papers, is it started with a leak from an anonymous source, that started with just an email, but ended up being a big investigation tackled in a global way.

James: And to provide a little bit more context to the global reach of this, people like David Cameron, Lionel Messi, possibly the best soccer player in the world, Prime Minister of Iceland, I think even Emma Watson, right? A lot of people were linked to this, and I think it caused some pretty high profile backlash, and resignations, and changed the global power landscape, too. So for anyone in our audience who says, well, terabytes, we deal in petabytes, a terabyte’s a small amount of data, comparatively there’s still enough in there to shift [laughs] the political landscape out there. So pretty large impacts.

Mar: Well for journalists, terabytes is enough [laughs]. I think that normally we deal with even smaller information. I think it’s not so much the volume of it, as to the importance of the contents, right? And I think that even though we knew about tax havens, and it’s an open secret, everybody knew that tax havens were being used. What we were able to do, by analyzing those 11.5 million files, was to expose, from the inside, how this machine worked, and how tax havens worked, and who used them. And, of course, we had these high profile clients of this system. The data came from Mossack Fonseca, which was a Panamanian law firm. It’s one of the top law firms in the world in creation of offshore companies. But it’s just one of them, I would say it’s the tip, of the tip, of the tip of the iceberg, right? And just by looking at them, and we looked at, basically, around 40 years of activity of this Panamanian law firm that had activity in more than 20 jurisdictions. We were able to show high profile names. As you’re saying, we found 140 politicians from more than 50 countries. Among them, there were a dozen country leaders. Some of them were in power. We had the Prime Minister of Iceland, we had the current President of Argentina, and, of course, we had businessmen, and we had celebrities. But it was so varied, and we had basically every single representative of the powerful tier of society. And with names that anybody could recognize. That’s why the Panama Papers shocked the world, because we put faces to a topic that was known, but that was not being tackled.

James: Yeah, absolutely, and there’s so many corollaries with this to so many of our other discussions. And I think we’re gonna go into each of them, and there’ll be sort of common themes that have run throughout a lot of our episodes. But the first that I wanna dive into is, we talk a lot about, on this show, the ability to tell a story with data. It’s one thing to have an answer or to come to a conclusion, and it’s a totally different thing to influence people to actually effect change. And I would love to know what the process is for you guys to tell stories, right? We see people like Nate Silver, and the New York Times is doing a great job with data journalism, and really bringing data storytelling to the masses. What’s the process like to make sure that you’re actually heard, so people understand that what you’re doing actually matters?

Mar: Well I think that the first good thing that we did was two and a half years ago we created the Data and Research Team at the ICIJ. We did not have programmers and data specialists in-house to work with these big leagues, or with this massive amount of public information that needs to be crunched. And today, the Data and Research unit is actually half of the ICIJ staff. We are a very small organization, we only have 12 people on staff, and the team is half of the staff. So that shows you how much value we give to technology in our work. And that has allowed us to basically take some steps that we were not taking before. And one of the first things we do once we have data, either if it comes from a public source or from a leak, like in the case of the Panama Papers, is to look at it and spend weeks, if not months analyzing it and processing it. With the Panama Papers, we received this hard drive with 2.6 terabytes of data. Well, I cannot start a global collaboration without knowing what’s inside. So we looked at what was inside, we started looking at the different types of files, and the first bad news was, wow, we have more than 3 million PDFs [laughs]. So in that moment, for example, we knew that we had to do a lot of processing and a lot of optical character recognition to extract the text from those images and make them searchable.

The second challenge that we had to work on is, okay, how do I make these files available to hundreds of reporters working in 80 different countries? And the good thing is that we had already developed some technology that would help us put it up in the cloud. So in past projects we have basically created or adapted certain software that is open source and that we have adapted to journalism. So let me give you an example. We got an open source tool called Blacklight that is basically a user interface to search for books in libraries. And we got that and connected it to Apache Solr as a search engine. And with that and some security around it, our reporters could log into a document search platform and search the documents from anywhere in the world, any minute. So that allowed reporters to basically search all the documents.

Another need that we had was to communicate among ourselves. So we got an open source social networking tool and adapted it to what I would say is data dating [laughs]. So we basically got all these reporters sharing what they had found in the data. And I said data dating because this social networking tool was originally designed for dating. So actually, one of the questions that you would get in the form is, what are you looking for, male or female? Well, we were not looking for males or females, we were looking for other reporters that could share leads and tips so that they could get to work together.

So I would say that it was very important first to see what we had, try to process it, then establish tools that would allow us to collaborate across borders. And it’s then, through the sharing, that we started finding patterns. And once we had those patterns, and we had the global picture, we decided on what was the best way to tell this story. So we not only tell stories through words. Of course, we did our 5,000 word story that had everything, and we did other, smaller stories, and we shared that with the partners, more than 100 media partners that participated in the Panama Papers. But we also think, how can we convey this idea that tax havens are being used in a systematic way throughout the world? Well, we did an interactive that basically uses illustrations, and that uses graphs, to show how the main politicians in the data were using offshore. So you can get around 70 stories that you can interact with one by one.

We also did, for example, an interactive game, where people could play and be in the shoes of a soccer player, and a businesswoman, and a politician, to actually be them and try to evade and avoid taxes.

James: [laughs]

Mar: So through this small game you would be able to learn how this world really works. And something else we did was to actually publish some of the data online to give this investigative power to the citizens worldwide. And we made a database, an online database, called the Offshore Leaks Database, where anybody can search the main names of the companies connected to the Panama Papers and other leaks that we have received, and search the connections to almost 500,000 companies in tax havens. So we’re basically helping break the secrecy that tax havens provide by providing this online database that didn’t exist before.

James: Absolutely love it, and part of the reason why I wanted to ask and understand how you guys did it is so many people, if we take this back to a lot of folks in our audience, you find a really interesting conclusion. Maybe it’s about your profit model, or your customer conversion rate, or a new mine that you’re trying to drill, right? You find this and you just assume that the data itself are compelling, that these numbers that you have are enough to convince people. And so then you go and try, and say, hey, look what I found, let’s change something, and it falls on deaf ears. They don’t understand or they don’t realize the potential of what you’ve found, because what you think is compelling on its own merit doesn’t necessarily seem that way to some of the decision makers.

So that’s on the company level. And with you guys, obviously, it sounds like, hey, we found the King of Saudi Arabia’s using tax havens, right? That sounds like it is worth the merit just on the data itself. But I suspect you still have to tell people why it matters, right? I think the fact that you had to do it, and the importance that you guys have of story telling for something that seems so overtly, obviously important speaks to just how actually necessary that process is for someone who’s dealing with something a little less evocative, right, with your profit models, or your customer conversion rates.

Mar: Right, well, for us it was very important to have a platform where we would share what we were finding. And I think that it’s not just that we gathered 100 media organizations to work together, it’s that we got them to communicate on a daily basis. So having this social network tool where you could log in and see what others had been working on, and what other trends people from other parts of the world were finding, was key to be able to find the story, and what was the key story that we wanted to convey. And without that daily communication, it would have been very difficult to start finding some trends. Like how the sports world uses offshore, or how big, or how many names of politicians we had in the data. So I think that one of the big takeaways of this investigation is that if you share, you get a lot in return. And in our case, what we did was to share all the documents, but also share what we were finding. And I think that that’s quite unusual for journalists. We normally keep the information to ourselves because we want our exclusive in the front page with our name. But, actually, what we’ve found is a new model where if you collaborate and you share the glory with others, you make a higher impact. And that’s what the Panama Papers shows, is that we’re no longer living in a world where the lone wolf can be successful. If you collaborate and share the insights, you’ll be able to go beyond what you would be able to do if you were working by yourself.

James: And that’s a perfect segue to another piece that I want to bring up here. And we see it all the time in the Big Data world, but it’s a broader technological trend, and that is this idea of decentralization. Where we have moved from this model where all the data was in one spot and analyzed by one team, and then distributed in a top-down fashion, and that was the way it worked. And now we’ve moved, in sort of the data analytics world, we have all these, some people call them citizen data scientists, sometimes just self-service data discovery, where there’s an opportunity to access data from anywhere and come up with your own conclusions, and to really get a much broader base of people working on it. And there just seems like a tremendous storyline, a corollary between what’s happening in that realm and then sort of what’s happening in the broader journalism world and the world in general, where the individuals seem to be a whole lot more powered than they really ever had been at any point in our lives so far.

Mar: Right, I think that we can no longer tackle issues by ourselves. [laughs] I think that we need to look at the world and we see that corruption works in a global way. That terrorism works in a global way. That tax evaders work in a global way. And in journalism, we just realized that we need to work in a global way. l do think, though, that it is very important to understand that there’s a human element to this that you need to cultivate, and it’s trust. I don’t think that we would have been able to work and share so much without trusting each other. And that’s something that doesn’t happen overnight. The ICIJ, actually, has been in existence since 1997, so almost 20 years. And in those 20 years we’ve been cultivating that trust to get to the point where we are today. We are not very critical human beings, in journalism and in general, and in order to get this global power, you need to trust each other. And I think that that’s not technology, but it’s something that needs to be put into this decentralized models, is how do we get the people that work in these networks to trust each other? And there’s some thought that needs to be put into that, too.

James: Yeah, no doubt, and I think you bring up this idea of the human element, and our audience will definitely remember back to a recent episode we had with Morgan Wright. He’s a cybersecurity expert and talks about everything from state sponsored terrorism, and he’s pretty much done it all. And the really interesting part is that hey, it all comes down to the people that you have. Because no system is going to be able to protect yourself against a rogue actor or someone who is in the know and then decides that what you’re doing doesn’t really fit them anymore, or maybe they just get a little lazy. And I would love to know how this anonymous source came to give you information for this, and then speak to this idea of, is this happening more and more? How is this sort of trend, and the human element, being responsible for a whole lot more information sharing, and that sort of ideal?

Mar: Well, security was one of the important aspects of this project, too. And as you were mentioning, the weakest links were the humans. [laughs] So no matter how much security measures we could put around it, we really had to train our reporters. I think that the Panama Papers worked the way they did, using technology and cloud based technology, to communicate, and do research, and share the documents, because the NSA was not our enemy. When we started the project, we started doing threat modeling analysis and looking at who could be our enemies. Who would wanna get hold of this data, and what would happen if they got hold of this data. And we discarded, from the beginning, that government agencies would be our enemies, because, after all, if they got hold of this data it would be useful for them, too, to find tax evasion, right? So that was key to be able to find a set of tools that allowed us to collaborate in the cloud. And we don’t talk much about how we communicated with the source, he called himself John Doe, and I say he or she, we don’t know. But everything started with just this message, right, about [laughs] hey, are you interested in data? After that, the data ended up being in a hard drive that we moved encrypted around Europe and around the world. And once we had the hard drive encrypted in the correct locations, where developers of the ICIJ team work, we basically uploaded it to Amazon and we did everything from there using Amazon Web Services. Again, because we knew that government agencies were not our enemies, were not our threats. We did try to give training to reporters on security, used things like two step authentication for our platforms to give an added layer of security, and also promoted the use of encrypted email through PGP for the communications that happened on email, so that in the case of an email getting hacked, we could at least not have the conversations right in the open.

James: [laughs]

Mar: So some of those, that was a challenge, because many of the journalists in the team didn’t know what this was, didn’t know what PGP was, two step authentication. Getting people to get an authenticator in their cellphone was a nightmare in some cases, it was a challenge. But I think that that’s a risk that we could not have, and we had to do a lot of effort on that note.

James: And one of the things that strikes me is you take that example and bring it to industry, people are people. For someone working in, perhaps, healthcare, maybe financial services, or, of course, in a government agency, they might be hearing that and say, oh, yeah, of course. Of course I have two factor authentication. But for any organizations that don’t really have that culture, you shouldn’t underestimate the hurdle it’s gonna take to get everyone on board to do that. If there’s a way to kinda skirt around it, make my life a little bit easier, if I’m just a front line employee who doesn’t really care and I’m not bought into the mission, or vision, or understanding of why the security’s so important, if there’s a way to skirt around it, I’m gonna do that. I think that’s a really interesting takeaway from sort of the experience that you had. I think it directly applies to a whole lot of our audience.

Mar: Well I think that the best motivation to get them into the security measures that we had devised for this project was that they could not access the data without them [laughs]. So I remember at the beginning, before this project, I had been trying to get my boss into PGP, and I couldn’t, I couldn’t. I got it installed in his computer, and still he was not using it. Well, we started having conversations at the beginning of the Panama Papers only using PGP. And I told him, I’m sorry, Gerard, we’re gonna have to have the conversations without you. [laughs]

James: [laughs]

Mar: At that moment he got into PGP immediately. Or the same thing, we also told reporters, if you don’t have two step authentication, there’s no way you’re gonna get in to communicate with the reporters, or you’re not gonna be able to search the documents. Well, if you have a motivation like this, which is like the carrot and the stick, then you really do it. And now people come to us and tell us, oh, thanks so much for doing it, I really think that this is something very important that is actually changing the way I do things now, moving forward.

James: Excellent, yeah, [laughs] couldn’t agree with you more. Thinking about how you can do more carrot and less stick is probably gonna be more successful most of the time.

One of the things I wanna do while we have you here is, you’re about to speak at the Strata-Hadoop conference in New York. You have a vision for sort of how data’s going to continue to change, certainly, your world, and the world of investigative journalism, and then sort of by proxy of that, more so the rest of us, and influence the rest of the world. I’m wondering if you can give us a sneak preview into some of the things that you have on your radar about how the world’s gonna continue to change.

Mar: So the Panama Papers has been a very successful investigation. We had put the spotlight on a topic that had been ignored before, and the impact of that has been high profile resignations. We had, for example, the Prime Minister of Iceland stepping down right after the publication of the information, and his connections to an offshore company in the British Virgin Islands. We had people going into the streets protesting, we have tax agencies using the data to look for money that has not been paid in taxes back home. We have lawmakers trying to bring more transparency to the use of tax havens and the corporate registries of these places. So the impact has been unprecedented, and we’re still seeing the effects that will be long lasting. Everything that we achieved was basically scratching the surface of the documents that we received. We used cloud based technologies to search the documents. Technology was great in enabling the cross-border collaboration. But there’s so much more that we could’ve done if we had had more resources, if we were as advanced as tech companies or businesses that are really adapting to big data are. So we did not do good content analysis, we did not do entity structure, we did not analyze patterns in the emails of these leaks. There are so many stories that we’ve probably missed. So I think that technology is infiltrating journalism, but there’s still so much more that we can do, and I’m gonna try to encourage this Strata crowd to join forces with us so that we can keep getting the powerful challenged. Because the world is moving towards the use of Big Data, therefore journalists should embrace that, too, in the same capacity. And that’s basically what I’m gonna be talking about in the conference.

James: Amazing, and a lot of the people that you’re gonna be surrounded by, they’re working on building machine learning, artificial intelligence, cognitive computing, pattern recognition, at the highest ability that we’ve ever seen. And an ability to process just unprecedented amounts of data in the smallest amounts of time. And it’s amazing to think of what can happen when you combine all this potential with the application that you guys are talking about. And then not just the investigative journalism community, but any sort of organization out there. And for our listeners who are trying to find patterns and tell more stories from the data that they have, which strikes me as really the heart of what you’re trying to get at. So it’s gonna be interesting to see how we bridge this gap between what is possible out there on one side, and what people are trying to do, and how do we marry those two together.

Mar: Right, so imagine I go on Facebook, and Facebook recommends me things that I need to buy, right? And it’s suggesting me things that I should do, or places where I should travel. Imagine that applied to investigative journalism. Imagine I go into the Panama Papers and, I don’t know, technology tells me, huh, you have been researching this guy from Syria. Well, maybe you’re interested in these other Syrian people. Or maybe you might be interested in looking at this interesting trend. I really want technology to help us find things that we didn’t know we needed to look for. And I think that that’s a great application, for example, of these recommendation systems that we’re seeing right now. So I think there is so much more to do. And, of course, there is one application which is the business world and the commercial market, and that’s where the money is. But let’s not forget that journalism is key to democracy, and that if we want to keep having a world where we can keep the powerful to account, we need to use those tools and those advanced software and machine learning to actually help us find stories that maybe we’re missing.

James: So, Mar, it’s been a pleasure having you on the show. You’re an interesting person [laughs], you’ve traveled all over the world, been doing some pretty great stuff, and maybe some of it you can’t tell us about, but if anyone in our audience wants to follow you or figure out what you’re up to, where are they going to do that?

Mar: Well I think that the best thing that they should do is go into the ICIJ website, which is icij.org, and subscribe to our newsletter. We only publish a couple of investigations a year, but when we do they’re powerful, they have great impact, and I’m sure your audience wants to be the first to know. So that’s what I would recommend them. I’m normally traveling the world, based in Spain, but I tweet a lot about how technology is affecting journalism, and this intersection between the two worlds. So my Twitter handle @cabralens, like the lens of a [laughs] camera, right? So @cabralens.

James: Well, Mar, it’s been an absolute pleasure to have you on the show today. I just wanna say thanks so much for coming on.

Mar: Thanks so much for inviting me. I hope we can keep doing great work through the use of technology.

