fbpx

OSINT or Open source intelligence refers to information about business or people that can be collected from online sources. However, it requires tools to do so, and here are 10 best OSINT Tools for 2020.

In a world full of information overload, it is natural that we feel the need to vet out the useful information. To do so, organizations globally employ a range of tools, both paid and unpaid. The latter category falls into the domain of open-source intelligence (OSINT) and can be incredibly helpful, especially when you’re looking to save hefty fees on market intelligence reports. Keeping this in mind, here are the 10 best OSINT tools for 2020:

OSINT Framework

Featuring over 30 categories of potential data including the dark web, social networks, and malicious file analysis; the OSINT Framework tool allows you to see the various ways in which you could access such types of data.

For example, let’s say you wanted to know where could you get more information about the dark web. To do so, you would simply click the relevant field on the tree as shown below and it would display a variety of sources you could use to further do your research:

10 Best OSINT Tools for 2020

This saves you a ton of time for having to search for the right tools and literally is a life-saver!

2.  Shodan

Known as the search engine of Internet of Thing (IoT) devices, Shodan allows you to find out information just about any device connected to the internet, whether it is a refrigerator, database, webcam, an industrial control system, or a simple smart television.

The advantage of Shodan is that hardly any other service offers such depth which can not only allow you to collect valuable intelligence but also gain a competitive advantage if you’re a business looking to know more about your competition.

best osint tools for 2020 4 768x277

To add to its credibility, Shodan boasts of the tool being used by 81% of Fortune 100 companies & 1000+ universities.

3. That’s Them

How many times have you wanted to know more about an individual before moving forward with them in terms of a business opportunity or anything else? That’s Them helps you do just that by allowing background checks to be conducted using either an individual’s full name and residency city & state; phone number; or full address. In return, it gives you access to their police records, lawsuits, asset ownership details, addresses, and phone numbers.

10 best osint tools for 2020 1 768x223

These checks are though currently available only within the United States. Furthermore, you would need to subscribe to a plan in order to get more than just basic information about someone.

 

3. N2YO.com

Allowing you to track satellites from afar, N2YO is a great tool for space enthusiasts. It does so by featuring a regularly searched menu of satellites in addition to a database where you could make custom queries along the lines of parameters such as the Space Command ID, launch date, satellite name, and an international designator. You could also set up custom alerts to know about space station events along with a live stream of the International Space Station(ISS)!

best osint tools for 2020 3 768x369

5. Google & Google Images

While Google the main search engine is something that needs no introduction with its vast array of search results including videos and curated news, a lesser-known “Google Images” also exists which can come in very handy.

Apart from the obvious function of allowing you to search images, it allows you to reverse-search any image to find its real origin and therefore save you a lot of time. For example, if I had an image that I needed to track to its original uploader in order to obtain copyright permissions, I would simply upload it to Google Images who would index the internet to find me the source.

10 Best OSINT Tools for 2020 2

Another incredibly helpful feature is the ability to filter images by their resolution, size, and copyrights license helping you find highly relevant images. Furthermore, as it scours images from across the internet, the results are much more in number as compared to other free sites like Pixabay.

6. Yandex Images

The Russian counter-weight to America’s Google, Yandex has been extremely popular in Russia and offers users the option to search across the internet for thousands of images. This is in addition to its reverse-image functionality which is remarkably similar to Google. A good option included within is that you could sort images category wise which can make your searches more specific and accurate.

best osint tools for 2020 2 768x330

Tip: In my personal experience; Yandex image search results are far more accurate and in-depth than Google Images.

7. Censys

Censys is built to help you secure your digital assets in a nutshell. How it works is by allowing users to enter details of their websites, IP addresses, and other digital asset identifiers which it then analyses for vulnerabilities. Once done, it then presents actionable insights for its users.

But this is not all. It is one thing to secure your company’s networks but another to ensure that work-from-home employees are not vulnerable as well with their own setups. Keeping this in mind, you could “scan your employees’ home networks for exposures and vulnerabilities.”

10 Best OSINT Tools for 2020 3 768x501

8. Knowem?

Every brand owner knows the disappointment of finding the social media handle they wanted for their business already taken. Knowem tackles this by allowing one to search username on over 500 social media networks including the famous ones with one simple search.

Additionally, it also has a feature to search for the availability of domain names but this isn’t something unique since pretty much every domain registrar would do so.

On the other hand, if you’re looking for someone to claim a bunch of profiles with a username of your choice automatically, 4 different paid plans are also offered as shown below.

10 Best OSINT Tools for 2020 4 768x247

9. The Internet Archive

A bit nostalgic about the 1990s? We have a time machine here allowing you to access the different versions of pretty much any website date wise. This means, if you wanted to see how a specific website looked like on let’s say 24 June 2003, you could do so using the Internet Archive tool.

10 Best OSINT Tools for 2020 6 1024x292

One potential use of the tool is for analyzing a competitor’s web presence over a time period and using it as market intelligence.

10. HaveIBeenPwned

With database breaches happening every day, it’s only a matter of time before your data also gets exposed. Therefore, keeping a check is vital to ensure you can change your credentials and other details in time. HIBP lets you exactly do so by entering either your password or email address.

10 Best OSINT Tools for 2020 5 768x289

To conclude, although this list is by no means exhaustive, these 10 OSINT tools will not only save your time but also a lot of money.  It is important to remember that every day, professionals from various walks of life utilize these and so it makes perfect sense for you to add it to your toolkit as well.

[Source: This article was published in hackread.com By Sudais Asif - Uploaded by the Association Member: Clara Johnson]

Categorized in Investigative Research

Crowdfunding has become the de facto way to support individual ventures and philanthropic efforts. But as crowdfunding platforms have risen to prominence, they’ve also attracted malicious actors who take advantage of unsuspecting donors. Last August, a report from the Verge investigated the Dragonfly Futurefön, a decade-long fraud operation that cost victims nearly $6 million and caught the attention of the FBI. Two years ago, the U.S. Federal Trade Commission announced it was looking into a campaign for a Wi-Fi-enabled, battery-powered backpack that disappeared with more than $700,000.

GoFundMe previously said fraudulent campaigns make up less than 0.1% of all those on its platform, but with millions of new projects launching each year, many bad actors are able to avoid detection. To help catch them, researchers at the University College London, Telefonica Research, and the London School of economics devised an AI system that takes into account textual and image-based features to classify fraudulent crowdfunding behavior at the moment of publication. They claim it’s up to 90.14% accurate at distinguishing between fraudulent and legitimate crowdfunding behavior, even without any user or donation activity.

While two of the largest crowdfunding platforms on the web — GoFundMe and Kickstarter — employ forms of automation to spot potential fraud, neither claims to take the AI-driven approach advocated by the study coauthors. A spokesperson for GoFundMe told VentureBeat the company relies on the “dedicated experts” on its trust and safety team, who use technology “on par with the financial industry” and community reports to spot fraudulent campaigns. To do this, they look at things like:

  • Whether the campaign abides by the terms of service
  • Whether it provides enough information for donors
  • Whether it’s plagiarized
  • Who started the campaign
  • Who is withdrawing funds
  • Who should be receiving funds

Kickstarter says it doesn’t use AI or machine learning tools to prevent fraud, excepting proprietary automated tools, and that the majority of its investigative work is performed manually by looking at what signals surface and analyzing them to guide any action taken. A spokesperson told VentureBeat that in 2018 Kickstarter’s team suspended 354 projects and 509,487 accounts and banned 5,397 users for violating the company’s rules and guidelines — 8 times as many as it suspended in 2017.

The researchers would argue those efforts don’t go far enough. “We find that fraud is a small percentage of the crowdfunding ecosystem, but an insidious problem. It corrodes the trust ecosystem on which these platforms operate, endangering the support that thousands of people receive year on year,” they wrote. “[Crowdfunding platforms aren’t properly] incentivized to combat fraud among users and the campaigns they launch: On the one hand, a platform’s revenue is directly proportional to the number of transactions performed (since the platform charges a fixed amount per donation); on the other hand, if a platform is transparent with respect to how much fraud it has, it may discourage potential donors from participating.”

To build a corpus that could be used to “teach” the above-mentioned system to pick out fraudulent campaigns, the researchers sourced entries from GoFraudMe, a resource that aims to catalog fraudulent cases on the platform. They then created two manually annotated data sets focusing on the health domain, where the monetary and emotional stakes tend to be high. One set contained 191 campaigns from GoFundMe’s medical category, while the other contained 350 campaigns from different crowdfunding platforms (Indiegogo, GoFundMe, MightyCause, Fundrazr, and Fundly) that were directly related to organ transplants.

 

Human annotators labeled each of the roughly 700 campaigns in the corpora as “fraud” or “not-fraud” according to guidelines that included factors like evidence of contradictory information, a lack of engagement on the part of donors, and participation of the creator in other campaigns. Next, the researchers examined different textual and visual cues that might inform the system’s analysis:

  • Sentiment analysis: The team extracted the sentiments and tones expressed in campaign descriptions using IBM’s Watson natural language processing service. They computed the sentiment as a probability across five emotions (sadness, joy, fear, disgust, and anger) before analyzing confidence scores for seven possible tones (frustration, satisfaction, excitement, politeness, impoliteness, sadness, and sympathy).
  • Complexity and language choice: Operating on the assumption that fraudsters prefer simpler language and shorter sentences, the researchers checked language complexity and word choice in the campaign descriptions. They looked at both a series of readability scores and language features like function words, personal pronouns, and average syllables per word, as well as the total number of characters.
  • Form of the text: The coauthors examined the visual structure of campaign text, looking at things like whether the letters were all lowercase or all uppercase and the number of emojis in the text.
  • Word importance and named-entity recognition: The team computed word importance for the text in the campaign description, revealing similarities (and dissimilarities) among campaigns. They also identified proper nouns, numeric entities, and currencies in the text and assigned them to a finite set of categories.
  • Emotion representation: The researchers repurposed a pretrained AI model to classify campaign images as evoking one of eight emotions (amusement, anger, awe, contentment, disgust, excitement, fear, and sadness) by fine-tuning it on 23,000 emotion-labeled images from Flickr and Instagram.
  • Appearance and semantic representation: Using another AI model, the researchers extracted image appearance representations that provided a description of each image, like dominant colors, the textures of the edges of segments, and the presence of certain objects. They also used a face detector algorithm to estimate the number of faces present in each image.

After boiling many thousands of possible features down to 71 textual and 501 visual variables, the researchers used them to train a machine learning model to automatically detect fraudulent campaigns. Arriving at this ensemble model required building sub-models to classify images and text as fraudulent or not fraudulent and combining the results into a single score for each campaign.

The coauthors claim their approach revealed peculiar trends, like the fact that legitimate campaigns are more likely to have images with at least one face compared with fraudulent campaigns. On the other hand, fraudulent campaigns are generally more desperate in their appeals, in contrast with legitimate campaigns’ descriptiveness and openness about circumstances.

“In recent years, crowdfunding has emerged as a means of making personal appeals for financial support to members of the public … The community trusts that the individual who requests support, whatever the task, is doing so without malicious intent,” the researchers wrote. “However, time and again, fraudulent cases come to light, ranging from fake objectives to embezzlement. Fraudsters often fly under the radar and defraud people of what adds up to tens of millions, under the guise of crowdfunding support, enabled by small individual donations. Detecting and preventing fraud is thus an adversarial problem. Inevitably, perpetrators adapt and attempt to bypass whatever system is deployed to prevent their malicious schemes.”

It’s possible that the system might be latching onto certain features in making its predictions, exhibiting a bias that’s not obvious at first glance. That’s why the coauthors plan to improve it by taking into account sources of labeling bias and test its robustness against unlabeled medically related campaigns across crowdfunding platforms.

“This is a significant step in building a system that is preemptive (e.g., a browser plugin) as opposed to reactive,” they wrote. “We believe our method could help build trust in this ecosystem by allowing potential donors to vet campaigns before contributing.”

[Source: This article was published in venturebeat.com By Kyle Wiggers - Uploaded by the Association Member: Jeremy Frink]

Categorized in Investigative Research

In the popular consciousness, the dark web is mostly known as the meeting place of terrorists and extortionist hackers. While there are other, less malicious players afoot, corporations and organizations need to know the real dangers and how to protect against them.

Dark. Mysterious. A den of thieves. A front for freedom fighters. It is many things for many different kinds of people, all of whom by nature or necessity find themselves driven to the fringes of digital society. It is the dark web. 

There’s still plenty of misinformation floating around out there about this obscure corner of the internet. The average cyber citizen is unaware of its very existence. Even for those intimately familiar with the dark web, accurate predictions as to its behavior and future effect on broader internet culture have remained elusive; predictions foretelling its mainstreaming, for instance, seem less and less likely with each passing year. The problem is, this is one case where ignorance isn’t always bliss. Dark web relevance to the general population is becoming more painfully apparent with every breaking news story about yet another data breach.

The amount of personal information accessible via a web connection these days is staggering. Names, addresses, and phone numbers are only the tip of the iceberg. Credit card information, marital status, browsing histories, purchase histories, medical histories (a favorite target of hackers these days) and so much more—every bit and byte of this data is at risk of theft, ransom, exposure and exploitation. A person’s entire life can be up for sale on the dark web without them being any the wiser. That is until their credit card comes up overdrawn, or worse, a mysterious and threatening email graces their inbox threatening to expose some very private information.

But despite the fact that it is the individual being exposed, the ones who truly have to worry are those entities entrusted with storing the individual data of their millions of users. The dark web is a potential nightmare for banks, corporations, government bureaus, health care providers—pretty much any entity with large databases storing sensitive (i.e., valuable) information. Many of these entities are waking up to the dangers, some rudely so, and are too late to avoid paying out a hefty ransom or fine depending on how they handle the situation. Whatever the case, the true cost is often to the reputation of the entity itself, and it is sometimes unrecoverable.

It should be obvious at this point that the dark web cannot be ignored. The first step to taking it seriously is to understand what it is and where it came from.

The landscape

Perhaps the most common misconception regarding the dark web begins with the internet itself. Contrary to popular sentiment, Google does not know all. In fact, it is not even close. Sundar Pichai and his legions of Googlers only index pages they can access, which by current estimates hover in and around the $60 billion mark. Sounds like a lot, but in reality this is only the surface web, a paltry 0.2% to 0.25% of digital space.

Home for the bulk of our data, the other 99.75% is known as the deep web. Research on deep web size is somewhat dated but the conditions the findings are based on appear to point to a growing size disparity, if any changes have occurred at all.

 

Unlike the surface web, which is made up of all networked information discoverable via public internet browsing, the deep web is all networked information blocked and hidden from public browsing.

Take Amazon as an example. It has its product pages, curated specifically to customer browsing habits and seemingly eerily aware of conversations people have had around their Alexa—this is the Surface Web. But powering this streamlined customer experience are databases storing details for hundreds of millions of customers; including personal identifiable information (PII), credit card and billing information, purchase history, and the like. Then there are databases for the millions of vendors, warehouse databases, logistical databases, corporate intranet, and so on. All in all you are looking at a foundational data well some 400 to 500 times larger than the visible surface.

The dark web is technically a part of this deep web rubric, meeting the criteria of being hidden from indexing by common web browsers. And although microscopically small in comparison it can have an outsized effect on the overall superstructure, sort of like a virus or a cure, depending on how it is used. In the Amazon example, where the dark web fits in is that a portion of its members would like nothing better than to access its deep web data for any number of nefarious purposes, including sale, ransom, or just to sow a bit of plain old anarchic chaos.

Such activities do not interest all dark web users, of course, with many seeing anonymity as an opportunity to fight off corruption rather than be a part of it. The dark web is a complex place, and to fully appreciate this shadow war of villains and vigilantes, how it can affect millions of people every now and then when it spills over into the light, first you have to understand its origins.

Breaking down the numbers

Anonymity is not without its challenges when it comes to mapping out hard figures. The key is to focus on commerce, a clear and reliable demarcating line. For the most part, those only seeking anonymity can stick to hidden chat rooms and the like. However, if a user is looking to engage in illegal activity, in most instances they’re going to have to pay for it. Several past studies and more recent work provide workable insight when extrapolating along this logic path.

First, a 2013 study analyzing 2,618 services being offered found over 44% to involve illicit activity. That number jumped to 57% in a follow up study conducted in 2016. These studies alone project an accelerating upward trend. Short of a more recent comprehensive study, the tried and true investigative maxim of “follow the money” should suffice in convincing the rational mind that this number is only going to grow dramatically. Especially when comparing the $250 million in bitcoin spent in 2012 on the dark web with the projected $1 billion mark for 2019.

Origins and operation

It was the invention of none other than the U.S. military—the Navy, of all branches, if you’d believe it. Seeking an easy way for spy networks to communicate without having to lug heavy encryption equipment to remote and hostile corners of the globe, the U.S. Naval Research Laboratory (NRL) came up with an ingenious solution. Ditching the equipment, it created an overlay network of unique address protocols and a convoluted routing system, effectively masking both the source and destination of all its traffic. By forgoing the traditional DNS system and relying instead on software specific browsers like Tor and Freenet and communication programs like I2P among others, dark web traffic was rendered invisible to traditional crawlers. Furthermore, with these browsers routing traffic through multiple user stations around the world, accurate tracking became extremely difficult. This solution afforded both flexibility and mobility for quick and easy insertion and extraction of human assets while securing sensitive communication to and from the field.

There was only one element missing. As co-creator Roger Dingledine explained, if only U.S. Department of Defense (DoD) personnel used the network it wouldn’t matter that source and destination were masked between multiple user stations. All users would be identifiable as part of the spy network. It would be like trying to hide a needle in a stack of needles. What the dark web needed was a haystack of non DoD users. And so in 2002 the software was made open source and anyone seeking the option to communicate and transact globally was invited to download it. Thousands of freedom-conscious people heeded the call and thus the dark web was born.

But freedom is morally ambiguous, granting expression to the best and worst urges of humanity. This is why security officers and senior executives in banks and businesses, insurance providers and intelligence agencies, all need to know who is using the dark web, what it is being used for, and how imminent is the threat it poses to their operations.

 [This article is originally published in calcalistech.com By riel Yosefi and Avraham Chaim Schneider - Uploaded by AIRS Member: Eric Beaudoin]

Categorized in Deep Web

[This article is originally published in icij.org written by Razzan Nakhlawi - Uploaded by AIRS Member: Robert Hensonw]

Web scraping: How to harvest data for untold stories

The thing about investigative reporting is, it’s hard work.

Sure, we’ve got more data available now. But data presents its own challenges: You’re tackling a massive pile of information, looking for the few best bits.

A technique called web scraping can help you extract information from a website that otherwise is not easily downloadable, using a piece of code or a program.

Web scraping gives you access to information living on the internet. If you can view it on a website, you can harvest it. And since you can collect it, you might as well automate that process for large datasets — at least if the website’s terms and conditions don’t say otherwise

And it really helps. “You might go to an agency’s website to get some data you’re interested in, but the way they’ve got their web app set up you’ve got to click through 3,000 pages to get all of the information,” said Investigative Reporters and Editors training director Cody Winchester

What’s the solution? Web scraping. You can write a script in a coding language (Python is one) that funnels the desired information into a spreadsheet and automatically flicks through all of the pages. Or you could bypass coding completely and use an application to deal swiftly with the web scraping, for example, Outwit Hub, a point and click tool that recognizes online elements, and downloads and organizes them into datasets.

Why does it?

Web scraping gives reporters the ability to create their own datasets with scraped information, opening the possibility of discovering new stories — a priority for investigative journalists.

Jodi Upton, the Knight Chair of Data and Explanatory Journalism at Syracuse University, began her career doing old-school “scraping.” Before online databases were widely used, when she only had access to paper records, she created her own databases manually. For Upton’s work, it was a necessity.

We do have some data from the government, but we know that it is so inaccurately kept that there are some really good stories in finding out just how wrong they are
Jodi Upton

When you’re trying to do news stories or investigative projects that require really interesting data, often it means you are creating a database yourself,” Upton said. Now it’s a lot easier, though the raw product, data itself, isn’t always easy to get your hands on.

There isn’t much incentive for organizations to disclose important data unless required to by law. Even then, the government does a poor job of data maintenance.

“We do have some data from the government, but we know that it is so inaccurately kept that there are some really good stories in finding out just how wrong they are,” Upton said.

Working on USA Today’s Mass Killings project, an investigation into Federal Bureau of Investigation mass homicide data, Upton and the rest of the data team scoured FBI data for mass homicides. The data was so poorly kept that the team had to hand-check and verify every incident itself. They found many more incidents the FBI had failed to log.

Upton said she was concerned. “This is our premiere crime fighting agency in the U.S. and when it comes to mass killings, they’re right around only 57 percent of the time.”

Sometimes the government will simply refuse to hand over data sets.

IRE’s Winchester described his attempt to get a database from a South Dakota government lobbyist, who argued that putting data up on a webpage was transparent enough:

“I put in a records request to get the data in the database that was powering their web app, and they successfully argued, ‘We’re already making the information available, we don’t have to do anything special to give it to you as data’.”

Aside from structured data, which is organized to make it more accessible, some stories are born from journalists giving structure to unstructured information. In 2013, Reuters investigated a marketplace for adopted children, who were being offered by the parents or guardians who had taken them in on Yahoo message boards to strangers.

 

The investigative team scraped the message boards and found 261 children on offer. The team was then able to organize the children by gender, age, nationality and by their —situations, such as having special needs or a history of abuse.

“That is not a dataset that a government agency produces. That is not a dataset that is easy to obtain in any way. It was just scraping effectively; a social media scraping,” Upton said.

How could you use web scraping?

Samantha Sunne, a freelance data and investigative reporter, created a whole tutorial for those without coding experience. “When I’m investigating stories as a reporter, I don’t actually write code that often,” Sunne said.

Instead, she uses Google Sheets to scrape tables and lists off a single page, using a simple formula within the program. The formula imports a few HTML elements into Google Sheets and is easy enough for anyone with basic HTML knowledge to follow.

You can read her entire tutorial here.

“I’ve used it for court documents at a local courthouse, I use it for job postings for a newsletter I write about journalism happenings,” Sunne said.

“It’s a spreadsheet that automatically updates from like 30 different job boards. It makes the most sense for things that continually update like that.”

How does ICIJ use web scraping? (This is for our more technically savvy readers!)

ICIJ developer Miguel Fiandor handles data harvesting on a much grander scale, trawling hundreds of thousands of financial documents.

Fiandor’s process begins by opening Google DevTools in the Chrome browser. It’s a mode that allows the user to see the inner workings of a website and play around with its code.

Then he uses the ‘Network’ tab in the Developer Tools window to find the exact request he needs. (A request is how a browser retrieves a webpage’s files from the website’s servers.)

He studies the communication between the website and his browser and isolates the requests he wants to target. Fiandor tests those requests with cURL, a Linux command that he can use from his computer terminal. This bypasses the need for a browser.

Next, Fiandor uses the BeautifulSoup library that needs to be downloaded through Python.

Code for scraping a corporate registry used in the Paradise Papers.
 tonga scraper 620w thumb

Beautifulsoup allows the user to parse HTML, or separate it into useful elements. After the request, he’ll save the data onto his computer, then route those elements into a spreadsheet and run his script.

Simple enough, right?

Categorized in Investigative Research

[This article is originally published in ijnet.org written by ALEXANDRA JEGERS - Uploaded by AIRS Member: Daniel K. Henry]

If you didn’t make it to Seoul for this year’s Uncovering Asia conference — or just couldn’t beat two panels at the same time — never fear, tipsheets from the impressive speakers are here! But just in case you can’t decide where to start, here are five presentations that are definitely worth checking out.

How to Make a Great Investigative Podcast

The human voice is a powerful tool. When someone is telling you a good story, you just can’t stop listening. It is, however, sometimes difficult to construct a good storyline for radio — especially if that’s new territory for you. In this excellent tipsheet, radio veteran Sandra Bartlett and Citra Prastuti, chief editor of Indonesian radio network Kantor Berita Radio, explain how to create images in your listener’s brain. Be sure to check out this story on some of their favorite investigative podcasts.

Best Verification Tools

From Russian trolls to teenage boys in MacedoniaCraig Silverman has exposed a wide gamut of disinformation operations around the world. He shared his experiences and research tips on a panel on fake news. Although years of experience like Silverman’s is certainly helpful, you don’t have to be an expert to spot fake news — or even a tech geek. In his tip sheet, Silverman continuously compiles tools that will help you to easily check out the accuracy of your sources.

Mojo in a Nutshell

Never heard of SCRAP or DCL? Then you are no different to most of the participants at the mojo workshop of award-winning television reporter Ivo Burum. Mojo is short for mobile journalism, which is becoming increasingly important in competitive, fast-moving newsrooms. Burum breaks down how to shoot, edit and publish an extraordinary video story just using your smartphone. Be sure not to miss his YouTube videos on mastering KineMaster or iMovie Basics or any of his regular columns on GIJN.

How to Track Criminal Networks

Transnational organized crime today generates $2 trillion in annual revenue, about the size of the UK economy, according to the UN Office on Drugs and Crime. It’s no wonder that, with that kind of cash on hand, authorities throughout the world often seem powerless to police them. But almost everybody leaves a digital trail, according to international affairs and crime reporter Alia Allana, who spoke at the Investigating Criminal Networks panel.

Web Scraping for Non-coders

Ever had a PDF document that you could not crawl with Ctrl + F? Or looked for specific information on a web page that has an endless number of pages? When documents have hundreds of pages or websites scroll for miles, it can be frustrating — not to mention time-consuming. With Pinar Dag and Kuang Keng Kueg Ser‘s guidance, you’ll be web scraping like a pro in no time.

This postwas originally published by the Global Investigative Journalism Network.

Alexandra Jegers is a journalist from Germany who has completed the KAS multimedia program. She has studied economics in Germany and Spain and now writes for Handelsblatt, Capital, and Wirtschaftswoche.

Main image CC-licensed by Unsplash via Evan Kirby.

Categorized in Investigative Research

Fake peer reviews are a problem in academic publishing. A big problem. Many publishers are taking proactive steps to limit the effects, but massive purges of papers tainted by problematic reviews continue to occur; to date, more than 500 papers have been retracted for this reason. In an effort to help, Clarivate Analytics is unveiling a new tool as part of the release of ScholarOne Manuscripts, its peer review and submission software in December, 2017. We spoke to Chris Heid, Head of Product for ScholarOne, about the new pilot program to detect unusual submission and peer review activity that may warrant further investigation by the journal.

Retraction Watch: Fake peer reviews are a major problem in publishing, but many publishers are hyper-aware of it and even making changes to their processes, such as not allowing authors to recommend reviewers. Why do you think the industry needs a tool to help detect fake reviews?

Chris Heid: Although the evidence is clear that allowing authors to suggest reviewers increases the chances of peer review fraud, there are still significant numbers of journals that use this as one of many methods to find qualified reviewers. We estimate that about half of the journals using ScholarOne Manuscripts continue to allow authors to add recommended reviewers during submission despite the risk.

The reason that journals don’t completely lock down these suggestions from authors, or limit profiles to verified institutional address, is that journals continue to struggle to find peer reviewers. According to our analysis of five years of peer review trends on ScholarOne journals, the average number of invitations sent to reviewers for research articles has almost doubled in the last five years.

Instead of trying to eliminate all risk and make the process even slower for peer review, journal publishers take a calculated risk and rely on human intervention to mitigate it. This adds both time to the overall process, and costs for the publisher to staff extra background checking. This means peer review is slower and costs publishers more for every article.

This tool’s goal is to improve a journal’s reputation by simplifying the management of a process, which relies on hundreds or even thousands of individual stakeholders. Even though the vast majority of peer reviews are legitimate, the reputational risks are very real for publishers. Why continue to work based solely on trust and human efforts when technology can automate this for us?

Clarivate Analytics is leading the charge on multiple fronts to provide the tools and information needed to combat fraud and improve the peer review process from end to end.

For example, by the end of the year, journals can use Publons Reviewer Locator/Connect (final name undecided) — the most comprehensive and precise reviewer search tool — to help identify the right reviewers, assess their competency, history and availability, contact them and invite them to review.

Recognition through Publons helps motivate reviewers to do a thoughtful and efficient job. The fraud prevention tool follows the submission of the review report to flag potential fraud.

RW: Can you say briefly how the tool works? What it looks for, etc? Anyone can spot a reviewer that’s not using an institutional email address, so what other qualities help signify a review is fake?

CH: The presence of a non-institutional email or absence of a Publons reviewer profile with verified review history are not fool proof for identifying peer review fraud. The fraud prevention tool evaluates 30+ factors based on web traffic, profile information, submission stats and other server data, compiled by our proprietary algorithm, to find fake profiles, impersonators and other unusual activity. This happens multiple times throughout the submission and review process.

By themselves, these factors may not trigger an alert, but combined with other actions, they can increase the risk level of a submission. From there, it is up to the journal editor and/or publisher to determine the next steps. In the long run, this tool will help to reduce the amount of retractions by highlighting issues during the submission process, instead of after publication.

RW: How can journals and publishers get access to the tool? Will there be a fee?

CH: Because the integrity of a published research is at risk due to peer review fraud, Clarivate is offering this as a core, free feature in the next ScholarOne release (December 2017). Journals may request the tool to be activated in the interface at any time. The tool can also be configured to the report access levels by role for each individual journal.

RW: Have you tested the tool’s effectiveness? Do you have any data on its rate of success, as well as false negatives or positives?

CH: The tool relies on alerts based on the right combination of factors and leaves the decision to the journal editor or publisher. This is similar to alerts a bank may issue about potential fraud. For example, if you receive an alert about unusual charges on your account, it could be legitimate if you’re on vacation or it could indicate actual credit card theft.

Clarivate actively worked on this capability for the past year, continuing to balance and refine the approach with feedback from publishers who are managing this risk every day. Refinements were made based on feedback including tool sensitivity and user interface.

Early testers indicated that a number of alerts resulted in direct action, including the rejection of a paper that was already accepted but unpublished, and a re-review of another paper by an editor and external reviewer. Once the feature is live in December, we expect additional refinement through feedback tools.

 Source: This article was published retractionwatch.com By Chris Heid

Categorized in Investigative Research

'Civic technologist' Friedrich Lindenberg shares a range of tools journalists can use in investigative journalism projects

Investigative journalism has long been the marker by which news organizations – and journalists – measure their worth.

"As a journalist, your main tool is talking to people and asking the right questions of the right people," said civic technologist and self-described "OpenGov and data journalism geek" Friedrich Lindenberg in a webinar on investigative journalism tools for the International Centre for Journalists last week.

"This is still true, but also you can ask the right questions with the right databases. You can ask the right questions with the right tools."

Lindenberg listed an arsenal of tools the investigative journalist can equip themselves with. Here are some of the highlights. 

DocumentCloud

Lindenberg described DocumentCloud as a "shared folder of documents", offering different folders that can be used for various investigations, control over who can access which documents, the ability to annotate different parts of documents, search throughout and embed segments or entire documents.

Even better, DocumentCloud looks for "entities" – such as people, companies, countries, institutions – identifies them and makes them searchable, which is especially useful for legal documents that may stretch into hundreds of pages when you are only interested in a few key points.

DocumentCloud is run by IRE but Lindenberg encouraged journalists to contact him at SourceAfrica.net, where an open source version of the software is available.
DocumentCloud screengrab
Screengrab from documentcloud.org

Overview

A "bit more of an expert tool", according to Lindenberg, Overview lets the user import documents from DocumentCloud or CSV files and then counts the frequency of words to make a "hierarchy of terms" for words.

When used this way, Overview can give a quick rundown of large numbers of documents, making it easier to understand the core topics.

OpenCorporates

Popularised by dramatisation of the Watergate scandal All The President's Men, "follow the money" is one of the mantras of investigative journalists everywhere.

Many large and expensive business registries exist to track the myriad connections between individuals and companies, but few within the reach of the press.

One of those few is Open Corporates, where users can search by name or company and filter by geographical jurisdiction.

DueDil

DueDil has a similar function to OpenCorporates but is a "slightly better research tool", said Lindenberg, as you can narrow the search on individuals with similar names by searching by birth date.

Where OpenCorporates has a global range of company information, DueDil mainly draws on UK companies. Both operate on a freemium model with monthly fees for greater access.

Investigative Dashboard

Both OpenCorporates and DueDil were built for business purposes, helping people to conduct due diligence on companies and individuals before any signing any contracts.

Investigative Dashboard though is tailor-made for journalists. Users can search business records scraped from websites in a range of countries or go through the directory of more than 450 business registries, company lists and "procurement databases" – which highlight the 'hot point' where companies and governments do business – to find detailed information.

"They also have a broad network of researchers in different regions," said Lindenberg, "and they will look at other databases that they will be familiar with and maybe even have stringers and contacts on the ground who will find information and documents."

Paul Radu, an investigative reporter at the OOCCRP who helped build the Investigative Dashboard, told Journalism.co.uk the platform has researchers in Eastern Europe, Africa, the Middle East and Latin America.

"We do pro bono due to diligence work for journalists and activists and these people have access to all the open databases," he said. "But also we managed to get some funding to access some pretty expensive databases that are very useful in tracking down the information across borders."
Investigative Dashboard screengrabScreengrab from investigativedashboard.org

Tabula

Governments are partial to releasing reports and figures in PDF files, making it difficult for journalists looking to analyse and investigate the data contained within.

In the UK, you can specify usable file formats (Excel or CSV for example) in Freedom of Information requests. But if you are still faced with data locked up in a PDF, you need Tabula.

"It's the gateway drug to data journalism", said Lindenberg of Tabula.

Simply download and install the software, open a PDF in the program, select a table and Tabula will convert it into a workable file format. Magic.

Lindenberg suggested many more tools to help journalists analyse documents and data, scrape web pages and further their investigations alongside the hour-long webinar.

However, he stressed that for best results viewers should pick one tool for a project and learn to use it well, rather than trying to get to grips with lots of new things at once.

"Learning these tools requires a bit of time experimentation," he said, "a bit of willingness to get into this new thing and once you've done that you will get some benefits out of it.

"If you're saying 'I'm not a computer person' I want you to stop doing that and say instead that you're a journalist who has arrived in the 21st century and is using digital tools in a way to conduct fantastic investigations."

Source: This article was published journalism.co.uk By Paul Radu,

Categorized in Investigative Research

The academic world is supposed to be a bright-lit landscape of independent research pushing back the frontiers of knowledge to benefit humanity.

Years of fingernail-flicking test tubes have paid off by finding the elixir of life. Now comes the hard stuff: telling the world through a respected international journal staffed by sceptics.

After drafting and deleting, adding and revising, the precious discovery has to undergo the ritual of peer-reviews. Only then may your wisdom arouse gasps of envy and nods of respect in the world’s labs and lecture theatres.

The goal is to score hits on the international SCOPUS database (69 million records, 36,000 titles – and rising as you read) of peer-reviewed journals. If the paper is much cited, the author’s CV and job prospects should glow.

SCOPUS is run by Dutch publisher Elsevier for profit.

It’s a tough track up the academic mountain; surely there are easier paths paved by publishers keen to help?

Indeed – but beware. The 148-year old British multidisciplinary weekly Nature calls them “predatory journals” luring naive young graduates desperate for recognition.

‘Careful checking’

“These journals say: ‘Give us your money and we’ll publish your paper’,” says Professor David Robie of New Zealand’s Auckland University of Technology. “They’ve eroded the trust and credibility of the established journals. Although easily picked by careful checking, new academics should still be wary.”

Shams have been exposed by getting journals to print gobbledygook papers by fictitious authors. One famous sting reported by Nature had a Dr. Anna O Szust being offered journal space if she paid. “Oszust” is Polish for “a fraud”.

Dr Robie heads AUT’s Pacific Media Centre, which publishes the Pacific Journalism Review, now in its 23rd year. During November he was at Gadjah Mada University (UGM) in Yogyakarta, Central Java, helping his Indonesian colleagues boost their skills and lift their university’s reputation.

The quality of Indonesian learning at all levels is embarrassingly poor for a nation of 260 million spending 20 percent of its budget on education.

The international ranking systems are a dog’s breakfast, but only UGM, the University of Indonesia and the Bandung Institute of Technology just make the tail end of the Times Higher Education world’s top 1000.

There are around 3500 “universities” in Indonesia; most are private. UGM is public.

UGM has been trying to better itself by sending staff to Auckland, New Zealand, and Munich, Germany, to look at vocational education and master new teaching strategies.

Investigative journalism

Dr. Robie was invited to Yogyakarta through the World Class Professor (WCP) programme, an Indonesian government initiative to raise standards by learning from the best.

Dr. Robie lectured on “developing investigative journalism in the post-truth era,” researching marine disasters and climate change. He also ran workshops on managing international journals.

During a break at UGM, he told Strategic Review that open access – meaning no charges made to authors and readers – was a tool to break the user-pays model.

AUT is one of several universities to start bucking the international trend to corral knowledge and muster millions. The big publishers reportedly make up to 40 percent profit – much of it from library subscriptions.

Pacific Journalism Review’s Dr. David Robie being presented with a model of Universitas Gadjah Mada’s historic main building for the Pacific Media Centre at the editor's workshop in Yogyakarta, Indonesia.

According to a report by AUT digital librarians Luqman Hayes and Shari Hearne, there are now more than 100,000 scholarly journals in the world put out by 3000 publishers; the number is rocketing so fast library budgets have been swept away in the slipstream.

In 2016, Hayes and his colleagues established Tuwhera (Māori for “be open”) to help graduates and academics liberate their work by hosting accredited and refereed journals at no cost.

The service includes training on editing, presentation and creating websites, which look modern and appealing. Tuwhera is now being offered to UGM – but Indonesian universities have to lift their game.

Language an issue
The issue is language and it’s a problem, according to Dr. Vissia Ita Yulianto, researcher at UGM’s Southeast Asian Social Studies Centre (CESASS) and a co-editor of IKAT research journal. Educated in Germany she has been working with Dr. Robie to develop journals and ensure they are top quality.

“We have very intelligent scholars in Indonesia but they may not be able to always meet the presentation levels required,” she said.

“In the future, I hope we’ll be able to publish in Indonesian; I wish it wasn’t so, but right now we ask for papers in English.”

Bahasa Indonesia, originally trade Malay, is the official language. It was introduced to unify the archipelagic nation with more than 300 indigenous tongues. Outside Indonesia and Malaysia it is rarely heard.

English is widely taught, although not always well. Adrian Vickers, professor of Southeast Asian Studies at Sydney University, has written that “the low standard of English remains one of the biggest barriers against Indonesia being internationally competitive.

“… in academia, few lecturers, let alone students, can communicate effectively in English, meaning that writing of books and journal articles for international audiences is almost impossible.”

Though the commercial publishers still dominate there are now almost 10,000 open-access peer-reviewed journals on the internet.

“Tuwhera has enhanced global access to specialist research in ways that could not previously have happened,” says Dr Robie. “We can also learn much from Indonesia and one of the best ways is through exchange programmes.”

This article was first published in Strategic Review and is republished with the author Duncan Graham’s permission. Graham blogs at indonesianow.blogspot.co.nz

Categorized in How to

The Organized Crime and Corruption Reporting Project (OCCRP), a non-profit network of investigative journalism centers in Europe and Eurasia, has launched a new data platform to enable journalists and researchers to sift more than 2 million documents and use the findings in their investigations.

People using the new data platform, called ID Search, will be able to set up email alerts notifying them when new results appear for their searches or for persons tracked on official watchlists. They can also create their own private watchlists.

 

More: 

 

Using the new tool, journalists and researchers will be able to access data including gazettes of commerce, company records, leaks, court cases and more. One of the most comprehensive open source lists of Politically Exposed Persons is also at users’ disposal. Starting today, most sources on ID Search will be updated every 24 hours.

Documents and databases are also cross-referenced with watchlists and international sanctions lists so that persons of interest involved in organized crime or corruption can be identified.

In the past few weeks, OCCRP has added documents from five additional offshore jurisdictions, reflecting growing public awareness of the shadowy structures that drive the criminal economy in the wake of the Panama Papers investigation.

The new tool is part of OCCRP's Investigative Dashboard (ID), a ground-breaking platform bringing together data search, visualizations and researcher expertise. It is currently used by more than 4,400 journalists including those from OCCRP's 24 partner centers.

Users can access the search engine at https://data.occrp.org.

Author:  TOM KING

Source:  https://www.occrp.org

Categorized in Investigative Research

How do you research thoroughly, save time, and get directly to the source you wish to find? GIJN’s Research Director Gary Price, who is also editor of InfoDOCKET, and Margot Williams, research editor for investigations at The Intercept, shared their Top 100 Research Tools. Overwhelmed with information; we asked Williams and Price to refine their tools and research strategies down to a Top 10.

What are the bare-essentials for an investigative journalist?

1. Security and Privacy  Security tools have never been more important. There is so much information that you give out without even knowing it. Arm yourself with knowledge. Be aware of privacy issues and learn how to modify your own traceability. This is paramount for your own security and privacy. Price and Williams recommend using Tor and Disconnect.me for sites that will block others from tracing your browsing history.

 

More: 

2. Find Specialized Sites and Databases  Do not run a generalized blind search. Think about who will have the information that you want to find. Get precise about your keywords. Does the file you are looking for even exist online? Or do you have to get it yourself in some way? Will you have to find an archive? Or get a first person interview? Fine tuning your research process will save you a lot of time.

3. Stay Current Price highly recommends Website Watcher. This tool automates the entire search process by monitoring your chosen web pages, and sends you instant updates when there are changes in the site. This tool allows you to stay current, with little effort. No more refreshing a webpage over and over again.

4. Read from Back to Front Where do you start looking for information? Do you start reading the headline or the footnotes? Most people start with the headline, however Williams gives an inside tip; she always start at the footnotes. The footnotes inform the articles body, and you can get straight to your information, without obtaining any bias from the author.

5. Create Your Own Archive Wayback Machine is a digital archive of the web. This site makes you see archived versions of web pages across time. Most importantly, Price recommended that you use this site to develop your own personal archive. A feature of the Wayback Machine now allows you to archive most webpages and pdf files. Do not keep all your sources on a site you might not always be able to access.

You can now keep the files not only in your own hard drive, but you share them online. Another useful resource for archiving is Zotero, a personal information management tool. Watch here for Price teaching how to use this incredible archive and information management tool. You can also form your own data with IFTTT. Gary Price teaches us how to do this here:

6.Pop up Archive Sick of scanning through podcasts and videos in order to get the information you need? Audio and Video searches are becoming increasingly popular, and can save you an incredible amount of time. This can be done with search engines like Popup Archive and C-SPAN.

7.Ignore Mainstream Media Reports Williams ignores sites like Reddit at all costs. These sites can lead your research astray, and you can become wrapped in knowledge that might later be deemed as false. Price is also wary of Wikipedia, for obvious reasons; any person, anywhere at anytime can change a story as they see fit. Stay curious, and keep digging.

 

8.Marine Traffic Marinetraffic.com makes it possible to track any kind of boat and real-time ship locations, port arrivals and departures. You can also see the track of the boats and follow the path to any vessel movement. Check out Price’s tutorial video of FlightAware, a data search that traces real time and historical flight movements.

9.Foreign Influence Explorer Needing to find sources on governments and money tracking? Foreign Influence Explorer will make your searches incredibly easy. This search engine makes it possible to track disclosures as they become available, and allows you to find out what people or countries have given money to, with the exact time and dates.

10.If you are going to use Google… Use it well. Google’s potential is rarely reached. For a common search engine, you can get extremely specific results if you know how. Williams explains that the Congress has a terrible search engine on their site, but if you use google you can better refine your search by typing your keywords next to “site:(URL)”. You can even get the time and date it was published by further specialising. Watch a video demonstration of a Google advance search feature here.

Source : gijc2015.org

Categorized in Investigative Research
Page 1 of 2

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media