A data approach to uncovering fraud in a multinational oil and gas supply company
NW Hinwise Solutions was tasked with sorting, linking and reporting on a collection of over half a million company records in order to produce an affidavit to submit to a court as evidence of fraud. We created a platform for manually extracting data from the over 200,000 scans of the company’s paper records, and linking them with the company’s internal accounting records and banking records. The platform included auditing tools, search tools, and reporting tools that resulted in a cleaner, easier system from which the lawyers could construct their case.
In 2013 our client, a multinational oil and gas supplier, had healthy revenue, and high expected net profit, but had been struggling with liquidity issues for nearly 10 years. When a new manager was hired, he went over the books and discovered evidence that one of the owners was stealing large amounts of money from the company coffers. Large cheques had been written out to him directly, or to cash. Certain suppliers, owned by family of this owner, were charging 4 times the going rate for supplies and equipment.
NW Hinwise Solutions UG was contracted to create a legal discovery usable in court in order to prove the extent of the fraud. We needed to collect and collate data from four sources – Source financial documents, internal accounting records, bank statements and detailed bank source documents. This proved to be quite a challenge because:
- The source financial documents were all on paper, on location in northern Africa. A company was contracted to scan the over 200,000 documents in a few short weeks, and when we received them they had no order or context. Approximately 30% of the files were blank, 10% were multiple pages, and 25% were not financial in nature.
- The source financial documents were largely in French, and mostly handwritten
- The internal accounting records were stored in PDF format with the most important information on a memo line that was often truncated. This information was non-recoverable
- The north African bank withheld its statements and detailed records for months and when they finally delivered them, delivered poorly scanned PDF documents.
To make sense of the data we first needed to get it into a form that could be manipulated electronically. Making sense of the data was an iterative refinement process, and each iteration gave us increased data quality.
First we had to make sense of the source financial documents (the ones on paper)
- Using Optical Character Recognition (OCR) we created a proprietary sorting algorithm that was able to sort approximately 60% of the documents into rough categories. Local workers were hired and a customized tool was built to categorize the remainder.
- Using Django and Python we coded a data entry tool that displays the PDF and entry fields for workers to key information. This was the basis for a custom data management platform.
- We contracted 2 companies in India to key in information based on our instructions. We sent both the same test lot of documents and compared the results electronically, auditing entries that had discrepancies. After a few iterations we chose the company that had the best results, and had the rest of the data entered with them. Auditing was done by the team lead in India, and then a final audit was done locally to ensure data quality.
At the same time, data needed to be converted from the internal accounting PDFs and bank PDFs into a more usable form
- We extracted the data directly to CSV using commercial software, but the output was heavily garbled.
- Using a combination of proprietary scripts and detail-oriented workers we reconstructed the account transactions based on the context of the data involved.
- We then had a tool to create transaction records – linking the bank records with their corresponding internal accounting records based on the number of fields they had in common.
- Finally these transactions were linked to the paper records. For this we used a manual tool that displayed a transaction and a list of possible candidates based on similarity of contained information.
With the data was properly reconstructed we created a series of reporting tools to generate an affidavit of records to submit to court, as well as tools to visualize document histories for selected transactions.