[Source: This article was published in thegroundtruthproject.org By Josh Coe - Uploaded by the Association Member: James Gill] 

Last week ProPublica uncovered a secret Facebook group for Customs and Border Patrol agents in which a culture of xenophobia and sexism seems to have thrived. The story was supported by several screenshots of offensive posts by “apparently legitimate Facebook profiles belonging to Border Patrol agents, including a supervisor based in El Paso, Texas, and an agent in Eagle Pass, Texas,” the report’s author A.C. Thompson wrote.

This is only the most recent example of the stories that can be found by digging into Facebook. Although Instagram is the new social media darling, Facebook, which also owns Instagram, still has twice the number of users and remains a popular place for conversations and interactions around specific topics. 

Although many groups are private and you might need an invitation or a source inside them to gain access, the world’s largest social network is a trove of publicly accessible information for reporters, you just need to know where to look. 

I reached out to Brooke Williams, an award-winning investigative reporter and Associate Professor of the Practice of Computational Journalism at Boston University and Henk van Ess, lead investigator for Bellingcat.com, whose fact-checking work using social media has earned a large online following, to talk about how they use Facebook to dig for sources and information for their investigations. Their answers were edited for length and clarity. 

1. Use visible community groups like a phonebook

 While it remains unclear how Thompson gained access to the Border Patrol group, Williams says you can start by looking at those groups that are public and the people that care about the issues you’re reporting. 

“I have quite a bit of success with finding sources in the community,” says Williams, “people on the ground who care about local issues, in particular, tend to form Facebook groups.” 

Williams uses Facebook groups as a phonebook of sorts when looking for sources.  For example, if a helicopter crashes in a neighborhood, she suggests searching for that specific neighborhood on Facebook, using specific keywords like the name of local streets or the particular district to find eyewitnesses. Her neighborhood in the Boston area, she recalls, has its own community page.

 Williams also recommends searching through Google Groups, where organizations often leave “breadcrumbs” in their message boards.

 “It’s not all of them,” she notes about these groups, “but it’s the ones that have their privacy settings that way.”

 After speaking with Williams, I spent a few hours poking around in Google Groups and discovered a surprising amount of local and regional organizations neglected their privacy settings. When looking through these group messages, I had a lot of success using keyword searches like “meeting minutes” or “schedule,” through which documents and contact information of “potential sources” were available. While you can’t necessarily see who the messages are being sent to, the sender’s email is often visible.

This is just one example of a group with available contacts

1 Search 22Southwest Baltimore22 Redacted

2 Search Meeting Minutes Redacted

3 Redacted 22Meeting Minutes22 Results

4 Meeting Minutes

2. Filter Facebook with free search tools created by journalists for journalists

Despite privacy settings, there’s plenty of low-hanging and fruitful information on social media sites from Facebook to Twitter to Instagram, as a 2013 investigation by New Orleans-based journalism organization The Lens shows. The nonprofit’s reporters used the “family members” section of a Charter School CEO’s Facebook page to expose her nepotistic hiring of six relatives.

 “But if you know how to filter stuff… you can do way more,” van Ess says.

  In 2017, van Ess helped dispel a hoax story about a rocket launcher found on the side of a road in Egypt using a combination of social media sleuthing, Google Earth and a free tool that creates panoramas from video to determine when the video clip of the launcher was originally shot. More recently, he used similar methods as well as Google Chrome plug-ins for Instagram and the help of more than 60 Twitter users to track down the Netherlands’ most wanted criminal

He says journalists often overlook Facebook as a resource because “99 percent” of the stuff on Facebook is “photos of cats, dogs, food or memes,” but there’s useful information if you can sort through the deluge of personal posts. That’s why he created online search tools graph.tips and whopostedwhat.com so that investigators like himself had a “quick method to filter social media.”

For those early-career journalists who’ve turned around quick breaking news blips or crime blotters for a paper’s city desk, might be familiar with the twitter tool TweetDeck (if not, get on it!), Who posted what? offers reporters a way to similarly search keywords and find the accounts posting about a topic or event. 

Here’s how it works: you can type in a keyword and choose a date range in the “Timerange” section to find all recent postings about that keyword. Clicking on a Facebook profile of interest, you can then copy and paste that Facebook account’s URL address into a search box on whopostedwhat.com to generate a “UID” number. This UID can then be used in the “Posts directly from/Posts associated with” section to search for instances when that profile mentioned that keyword. 

These tools are not foolproof. Often times, searches will yield nothing. Currently, some of its functions (like searching a specific day, month or year) don’t seem to work (more on that in the next section), but the payoff can be big.  

“It enables you essentially to directly search somebody’s timeline rather than scrolling and scrolling and scrolling and having it load,” says Williams of graph.tips, which she employs in her own investigations. “You can’t control the timeline, but you can see connections between people which is applicable, I found, in countries other than the States.”

 While she declined to provide specific examples of how she uses graph.tips—she is using van Ess’s tools in a current investigation—she offered generalized scenarios in which it could come in handy. 

For instance, journalists can search “restaurants visited by” and type in the name of two politicians. “Or you could, like, put in ‘photos tagged with’ and name a politician or a lobbyist,” she says. She says location tagging is especially popular with people outside the US. 

Facebook’s taken a lot of heat recently about privacy issues, so many OSINT tools have ceased to work, or, like graph.tips, have had to adapt. 

 3. Keep abreast of the changes to the platforms 

The trouble with these tools is their dependence on the social platform’s whim–or “ukase” as van Ess likes to call it. 

For example, on June 7, Facebook reduced the functionality of Graph Search, rendering van Ess’s graph.tips more difficult to use.

According to van Ess, Facebook blocked his attempts to fix graph.tips five times and it took another five days before he figured out a method to get around the new restrictions by problem-solving with the help of Twitter fans. The result is graph.tips/facebook.html, which he says takes longer than the original graph.tips, but allows you to search Facebook much in the same way as the original. 

Even though the site maintains the guarantee that it’s “completely free and open, as knowledge should be,” van Ess now requires first-time users to ask him directly to use the tool, in order to filter through the flood of requests he claims he has received. 

I have not yet been given access to the new graph.tips and can’t confirm his claims, but van Ess welcomes investigators interested in helping to improve its functionality. Much like his investigations, he crowdsources improvements to his search tools. Graph.tips users constantly iron out issues with the reworked tool on GitHub, which can be used like a subreddit for software developers.  

Ongoing user feedback, as well as instructions on how to use van Ess’s new Facebook tool, can be found here. A similar tool updated as recently as July 3 and created by the Czech OSINT company Intel X is available here, though information regarding this newer company is sparse. By contrast, all van Ess’s tools are supported by donations. 

The OSINT community has its own subreddit, where members share the latest tools of their trade. 

4. Use other social media tools to corroborate your findings

When it comes to social media investigations, van Ess says you need to combine tools with “strategy.” In other words, learn the language of the search tool–he shared this helpful blog post listing all of the advanced search operator codes a journalist would need while using Twitter’s advanced search feature.

Williams also had a Twitter recommendation: TwXplorer. Created by the Knight Lab this helpful tool allows reporters to filter Twitter for the 500 most recent uses of a word or phrase in 12 languages. The application will then list all the handles tweeting about that phrase as well as the most popular related hashtags.

Bonus: More search tools 

If you want even more open-source tools honed for journalistic purposes, Mike Reilley of The Society of Professional Journalists published this exhaustive and continuously updated list of online applications last month. Be warned though: not all of them are free to use.

Categorized in Investigative Research

[This article is originally published in icij.org written by Razzan Nakhlawi - Uploaded by AIRS Member: Robert Hensonw]

Web scraping: How to harvest data for untold stories

The thing about investigative reporting is, it’s hard work.

Sure, we’ve got more data available now. But data presents its own challenges: You’re tackling a massive pile of information, looking for the few best bits.

A technique called web scraping can help you extract information from a website that otherwise is not easily downloadable, using a piece of code or a program.

Web scraping gives you access to information living on the internet. If you can view it on a website, you can harvest it. And since you can collect it, you might as well automate that process for large datasets — at least if the website’s terms and conditions don’t say otherwise

And it really helps. “You might go to an agency’s website to get some data you’re interested in, but the way they’ve got their web app set up you’ve got to click through 3,000 pages to get all of the information,” said Investigative Reporters and Editors training director Cody Winchester

What’s the solution? Web scraping. You can write a script in a coding language (Python is one) that funnels the desired information into a spreadsheet and automatically flicks through all of the pages. Or you could bypass coding completely and use an application to deal swiftly with the web scraping, for example, Outwit Hub, a point and click tool that recognizes online elements, and downloads and organizes them into datasets.

Why does it?

Web scraping gives reporters the ability to create their own datasets with scraped information, opening the possibility of discovering new stories — a priority for investigative journalists.

Jodi Upton, the Knight Chair of Data and Explanatory Journalism at Syracuse University, began her career doing old-school “scraping.” Before online databases were widely used, when she only had access to paper records, she created her own databases manually. For Upton’s work, it was a necessity.

We do have some data from the government, but we know that it is so inaccurately kept that there are some really good stories in finding out just how wrong they are
Jodi Upton

When you’re trying to do news stories or investigative projects that require really interesting data, often it means you are creating a database yourself,” Upton said. Now it’s a lot easier, though the raw product, data itself, isn’t always easy to get your hands on.

There isn’t much incentive for organizations to disclose important data unless required to by law. Even then, the government does a poor job of data maintenance.

“We do have some data from the government, but we know that it is so inaccurately kept that there are some really good stories in finding out just how wrong they are,” Upton said.

Working on USA Today’s Mass Killings project, an investigation into Federal Bureau of Investigation mass homicide data, Upton and the rest of the data team scoured FBI data for mass homicides. The data was so poorly kept that the team had to hand-check and verify every incident itself. They found many more incidents the FBI had failed to log.

Upton said she was concerned. “This is our premiere crime fighting agency in the U.S. and when it comes to mass killings, they’re right around only 57 percent of the time.”

Sometimes the government will simply refuse to hand over data sets.

IRE’s Winchester described his attempt to get a database from a South Dakota government lobbyist, who argued that putting data up on a webpage was transparent enough:

“I put in a records request to get the data in the database that was powering their web app, and they successfully argued, ‘We’re already making the information available, we don’t have to do anything special to give it to you as data’.”

Aside from structured data, which is organized to make it more accessible, some stories are born from journalists giving structure to unstructured information. In 2013, Reuters investigated a marketplace for adopted children, who were being offered by the parents or guardians who had taken them in on Yahoo message boards to strangers.

The investigative team scraped the message boards and found 261 children on offer. The team was then able to organize the children by gender, age, nationality and by their —situations, such as having special needs or a history of abuse.

“That is not a dataset that a government agency produces. That is not a dataset that is easy to obtain in any way. It was just scraping effectively; a social media scraping,” Upton said.

How could you use web scraping?

Samantha Sunne, a freelance data and investigative reporter, created a whole tutorial for those without coding experience. “When I’m investigating stories as a reporter, I don’t actually write code that often,” Sunne said.

Instead, she uses Google Sheets to scrape tables and lists off a single page, using a simple formula within the program. The formula imports a few HTML elements into Google Sheets and is easy enough for anyone with basic HTML knowledge to follow.

You can read her entire tutorial here.

“I’ve used it for court documents at a local courthouse, I use it for job postings for a newsletter I write about journalism happenings,” Sunne said.

“It’s a spreadsheet that automatically updates from like 30 different job boards. It makes the most sense for things that continually update like that.”

How does ICIJ use web scraping? (This is for our more technically savvy readers!)

ICIJ developer Miguel Fiandor handles data harvesting on a much grander scale, trawling hundreds of thousands of financial documents.

Fiandor’s process begins by opening Google DevTools in the Chrome browser. It’s a mode that allows the user to see the inner workings of a website and play around with its code.

Then he uses the ‘Network’ tab in the Developer Tools window to find the exact request he needs. (A request is how a browser retrieves a webpage’s files from the website’s servers.)

He studies the communication between the website and his browser and isolates the requests he wants to target. Fiandor tests those requests with cURL, a Linux command that he can use from his computer terminal. This bypasses the need for a browser.

Next, Fiandor uses the BeautifulSoup library that needs to be downloaded through Python.

Code for scraping a corporate registry used in the Paradise Papers.
 tonga scraper 620w thumb

Beautifulsoup allows the user to parse HTML, or separate it into useful elements. After the request, he’ll save the data onto his computer, then route those elements into a spreadsheet and run his script.

Simple enough, right?

Categorized in Investigative Research

[This article is originally published in ijnet.org written by ALEXANDRA JEGERS - Uploaded by AIRS Member: Daniel K. Henry]

If you didn’t make it to Seoul for this year’s Uncovering Asia conference — or just couldn’t beat two panels at the same time — never fear, tipsheets from the impressive speakers are here! But just in case you can’t decide where to start, here are five presentations that are definitely worth checking out.

How to Make a Great Investigative Podcast

The human voice is a powerful tool. When someone is telling you a good story, you just can’t stop listening. It is, however, sometimes difficult to construct a good storyline for radio — especially if that’s new territory for you. In this excellent tipsheet, radio veteran Sandra Bartlett and Citra Prastuti, chief editor of Indonesian radio network Kantor Berita Radio, explain how to create images in your listener’s brain. Be sure to check out this story on some of their favorite investigative podcasts.

Best Verification Tools

From Russian trolls to teenage boys in MacedoniaCraig Silverman has exposed a wide gamut of disinformation operations around the world. He shared his experiences and research tips on a panel on fake news. Although years of experience like Silverman’s is certainly helpful, you don’t have to be an expert to spot fake news — or even a tech geek. In his tip sheet, Silverman continuously compiles tools that will help you to easily check out the accuracy of your sources.

Mojo in a Nutshell

Never heard of SCRAP or DCL? Then you are no different to most of the participants at the mojo workshop of award-winning television reporter Ivo Burum. Mojo is short for mobile journalism, which is becoming increasingly important in competitive, fast-moving newsrooms. Burum breaks down how to shoot, edit and publish an extraordinary video story just using your smartphone. Be sure not to miss his YouTube videos on mastering KineMaster or iMovie Basics or any of his regular columns on GIJN.

How to Track Criminal Networks

Transnational organized crime today generates $2 trillion in annual revenue, about the size of the UK economy, according to the UN Office on Drugs and Crime. It’s no wonder that, with that kind of cash on hand, authorities throughout the world often seem powerless to police them. But almost everybody leaves a digital trail, according to international affairs and crime reporter Alia Allana, who spoke at the Investigating Criminal Networks panel.

Web Scraping for Non-coders

Ever had a PDF document that you could not crawl with Ctrl + F? Or looked for specific information on a web page that has an endless number of pages? When documents have hundreds of pages or websites scroll for miles, it can be frustrating — not to mention time-consuming. With Pinar Dag and Kuang Keng Kueg Ser‘s guidance, you’ll be web scraping like a pro in no time.

This postwas originally published by the Global Investigative Journalism Network.

Alexandra Jegers is a journalist from Germany who has completed the KAS multimedia program. She has studied economics in Germany and Spain and now writes for Handelsblatt, Capital, and Wirtschaftswoche.

Main image CC-licensed by Unsplash via Evan Kirby.

Categorized in Investigative Research

Fake peer reviews are a problem in academic publishing. A big problem. Many publishers are taking proactive steps to limit the effects, but massive purges of papers tainted by problematic reviews continue to occur; to date, more than 500 papers have been retracted for this reason. In an effort to help, Clarivate Analytics is unveiling a new tool as part of the release of ScholarOne Manuscripts, its peer review and submission software in December, 2017. We spoke to Chris Heid, Head of Product for ScholarOne, about the new pilot program to detect unusual submission and peer review activity that may warrant further investigation by the journal.

Retraction Watch: Fake peer reviews are a major problem in publishing, but many publishers are hyper-aware of it and even making changes to their processes, such as not allowing authors to recommend reviewers. Why do you think the industry needs a tool to help detect fake reviews?

Chris Heid: Although the evidence is clear that allowing authors to suggest reviewers increases the chances of peer review fraud, there are still significant numbers of journals that use this as one of many methods to find qualified reviewers. We estimate that about half of the journals using ScholarOne Manuscripts continue to allow authors to add recommended reviewers during submission despite the risk.

The reason that journals don’t completely lock down these suggestions from authors, or limit profiles to verified institutional address, is that journals continue to struggle to find peer reviewers. According to our analysis of five years of peer review trends on ScholarOne journals, the average number of invitations sent to reviewers for research articles has almost doubled in the last five years.

Instead of trying to eliminate all risk and make the process even slower for peer review, journal publishers take a calculated risk and rely on human intervention to mitigate it. This adds both time to the overall process, and costs for the publisher to staff extra background checking. This means peer review is slower and costs publishers more for every article.

This tool’s goal is to improve a journal’s reputation by simplifying the management of a process, which relies on hundreds or even thousands of individual stakeholders. Even though the vast majority of peer reviews are legitimate, the reputational risks are very real for publishers. Why continue to work based solely on trust and human efforts when technology can automate this for us?

Clarivate Analytics is leading the charge on multiple fronts to provide the tools and information needed to combat fraud and improve the peer review process from end to end.

For example, by the end of the year, journals can use Publons Reviewer Locator/Connect (final name undecided) — the most comprehensive and precise reviewer search tool — to help identify the right reviewers, assess their competency, history and availability, contact them and invite them to review.

Recognition through Publons helps motivate reviewers to do a thoughtful and efficient job. The fraud prevention tool follows the submission of the review report to flag potential fraud.

RW: Can you say briefly how the tool works? What it looks for, etc? Anyone can spot a reviewer that’s not using an institutional email address, so what other qualities help signify a review is fake?

CH: The presence of a non-institutional email or absence of a Publons reviewer profile with verified review history are not fool proof for identifying peer review fraud. The fraud prevention tool evaluates 30+ factors based on web traffic, profile information, submission stats and other server data, compiled by our proprietary algorithm, to find fake profiles, impersonators and other unusual activity. This happens multiple times throughout the submission and review process.

By themselves, these factors may not trigger an alert, but combined with other actions, they can increase the risk level of a submission. From there, it is up to the journal editor and/or publisher to determine the next steps. In the long run, this tool will help to reduce the amount of retractions by highlighting issues during the submission process, instead of after publication.

RW: How can journals and publishers get access to the tool? Will there be a fee?

CH: Because the integrity of a published research is at risk due to peer review fraud, Clarivate is offering this as a core, free feature in the next ScholarOne release (December 2017). Journals may request the tool to be activated in the interface at any time. The tool can also be configured to the report access levels by role for each individual journal.

RW: Have you tested the tool’s effectiveness? Do you have any data on its rate of success, as well as false negatives or positives?

CH: The tool relies on alerts based on the right combination of factors and leaves the decision to the journal editor or publisher. This is similar to alerts a bank may issue about potential fraud. For example, if you receive an alert about unusual charges on your account, it could be legitimate if you’re on vacation or it could indicate actual credit card theft.

Clarivate actively worked on this capability for the past year, continuing to balance and refine the approach with feedback from publishers who are managing this risk every day. Refinements were made based on feedback including tool sensitivity and user interface.

Early testers indicated that a number of alerts resulted in direct action, including the rejection of a paper that was already accepted but unpublished, and a re-review of another paper by an editor and external reviewer. Once the feature is live in December, we expect additional refinement through feedback tools.

 Source: This article was published retractionwatch.com By Chris Heid

Categorized in Investigative Research

'Civic technologist' Friedrich Lindenberg shares a range of tools journalists can use in investigative journalism projects

Investigative journalism has long been the marker by which news organizations – and journalists – measure their worth.

"As a journalist, your main tool is talking to people and asking the right questions of the right people," said civic technologist and self-described "OpenGov and data journalism geek" Friedrich Lindenberg in a webinar on investigative journalism tools for the International Centre for Journalists last week.

"This is still true, but also you can ask the right questions with the right databases. You can ask the right questions with the right tools."

Lindenberg listed an arsenal of tools the investigative journalist can equip themselves with. Here are some of the highlights. 

DocumentCloud

Lindenberg described DocumentCloud as a "shared folder of documents", offering different folders that can be used for various investigations, control over who can access which documents, the ability to annotate different parts of documents, search throughout and embed segments or entire documents.

Even better, DocumentCloud looks for "entities" – such as people, companies, countries, institutions – identifies them and makes them searchable, which is especially useful for legal documents that may stretch into hundreds of pages when you are only interested in a few key points.

DocumentCloud is run by IRE but Lindenberg encouraged journalists to contact him at SourceAfrica.net, where an open source version of the software is available.
DocumentCloud screengrab
Screengrab from documentcloud.org

Overview

A "bit more of an expert tool", according to Lindenberg, Overview lets the user import documents from DocumentCloud or CSV files and then counts the frequency of words to make a "hierarchy of terms" for words.

When used this way, Overview can give a quick rundown of large numbers of documents, making it easier to understand the core topics.

OpenCorporates

Popularised by dramatisation of the Watergate scandal All The President's Men, "follow the money" is one of the mantras of investigative journalists everywhere.

Many large and expensive business registries exist to track the myriad connections between individuals and companies, but few within the reach of the press.

One of those few is Open Corporates, where users can search by name or company and filter by geographical jurisdiction.

DueDil

DueDil has a similar function to OpenCorporates but is a "slightly better research tool", said Lindenberg, as you can narrow the search on individuals with similar names by searching by birth date.

Where OpenCorporates has a global range of company information, DueDil mainly draws on UK companies. Both operate on a freemium model with monthly fees for greater access.

Investigative Dashboard

Both OpenCorporates and DueDil were built for business purposes, helping people to conduct due diligence on companies and individuals before any signing any contracts.

Investigative Dashboard though is tailor-made for journalists. Users can search business records scraped from websites in a range of countries or go through the directory of more than 450 business registries, company lists and "procurement databases" – which highlight the 'hot point' where companies and governments do business – to find detailed information.

"They also have a broad network of researchers in different regions," said Lindenberg, "and they will look at other databases that they will be familiar with and maybe even have stringers and contacts on the ground who will find information and documents."

Paul Radu, an investigative reporter at the OOCCRP who helped build the Investigative Dashboard, told Journalism.co.uk the platform has researchers in Eastern Europe, Africa, the Middle East and Latin America.

"We do pro bono due to diligence work for journalists and activists and these people have access to all the open databases," he said. "But also we managed to get some funding to access some pretty expensive databases that are very useful in tracking down the information across borders."
Investigative Dashboard screengrabScreengrab from investigativedashboard.org

Tabula

Governments are partial to releasing reports and figures in PDF files, making it difficult for journalists looking to analyse and investigate the data contained within.

In the UK, you can specify usable file formats (Excel or CSV for example) in Freedom of Information requests. But if you are still faced with data locked up in a PDF, you need Tabula.

"It's the gateway drug to data journalism", said Lindenberg of Tabula.

Simply download and install the software, open a PDF in the program, select a table and Tabula will convert it into a workable file format. Magic.

Lindenberg suggested many more tools to help journalists analyse documents and data, scrape web pages and further their investigations alongside the hour-long webinar.

However, he stressed that for best results viewers should pick one tool for a project and learn to use it well, rather than trying to get to grips with lots of new things at once.

"Learning these tools requires a bit of time experimentation," he said, "a bit of willingness to get into this new thing and once you've done that you will get some benefits out of it.

"If you're saying 'I'm not a computer person' I want you to stop doing that and say instead that you're a journalist who has arrived in the 21st century and is using digital tools in a way to conduct fantastic investigations."

Source: This article was published journalism.co.uk By Paul Radu,

Categorized in Investigative Research

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.
Please wait

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Newsletter Subscription

Receive Great tips via email, enter your email to Subscribe.
Please wait

Follow Us on Social Media

Book Your Seat for Webinar GET FREE REGISTRATION FOR MEMBERS ONLY      Register Now