In a world where access to information is increasingly free, Open Data plays an important role and is set to grow exponentially. Many areas can exploit the potential of this high-value data, particularly in the field of local banking. Indeed, such banks constantly deal with societal, economic, and demographic issues (risk, credit, investment, etc.), which are precisely the themes found mainly in Open Data. This is why the DataLab Group has decided to exploit this booming new source.
Introduction
Open Data refers to freely accessible and usable data, made available by the government (data.gouv, Ministries) or public organizations such as INSEE and OpenDataSoft. Any user can access and reuse this data as they wish, without restriction. This data helps improve collaboration, participation, and social innovation. The opening of data to the general public is part of a global Open Source movement, particularly in the digital sector. The development of Open Data has also been facilitated by the adoption of various laws over the past twenty years:
- The CADA law allows anyone who requests it to access administrative documents. It was updated in 2005 to allow their reuse as well.
- In December 2015, the Valter law established the principle of free public data.
- At the end of 2016, the Lemaire law required public administrations to publish their main documents and data online.
- In 2019, the European Commission required member states to publish reusable data from their public sector and listed a set of data sets that must be accessible by June 2024, such as meteorological data.
- In early 2021, Prime Minister Jean Castex asked public institutions to accelerate the opening of their data.
As a result, in 2010, the French government created its Open Data platform, data.gouv, which is maintained by Etalab. To develop this platform, Etalab relies on the International Open Data Charter, published in 2013, which sets out seven criteria for publishing information. Data must be:
- Raw
- Fresh
- Accessible (at a reasonable cost)
- Non-discriminatory in their uses
- Machine-readable
- In an open format
- Subject to an open license
Challenges and Opportunities for the DataLab Group
Crédit Agricole Group, as the 2nd largest French banking group, is a major player in the French economy. It is a highly influential and widespread banking group across the national territory, thanks to its network of agencies (No. 1 in France with 7,400 agencies) and covering all banking professions and services (banking, savings, insurance, asset management, etc.). The group has every interest in exploiting Open Source data, which provides information of different types (finance, transport, culture, environment, etc.) at different spatial scales (iris, municipalities, departments, etc.).
In this context, the DataLab Group has carried out work to retrieve and exploit this data for various use cases. This data can be used to enrich the internal data held by the group, such as to better understand the geographical context of its professional and individual clients. But also to create new projects that rely 100% on Open Data for energy transition, climate risks, detecting attractiveness poles, etc.
However, in order to fully harness the potential of Open Data, the DataLab Group must ensure that it utilizes the data in the most optimal manner. As per the definition of Open Data, users are free to exploit the data as they see fit. From raw data, it is possible to modify and merge retrieved information to ensure its relevance. By creating indicators, it becomes possible to aggregate and combine different sources to generate new and even unprecedented insights. This is the true value of Open Source data. However, it is important to exercise caution as not all approaches will yield the desired results.
Data Governance
When using data retrieved online, it is essential to pay particular attention to data quality, starting with source selection. Indeed, the first thing to do when retrieving freely accessible data is to check the license and the issuing source to ensure the reliability of the information; it is always preferable to favor official sources. It is also necessary to ensure that the data provides the expected geographical coverage. For example, if you want information on all French departments, you should check if you have the expected amount of data. Finally, another indicator of source quality is the update frequency; a database must be regularly refreshed to be exploitable.
Next, once the source is selected, various checks must be carried out upon receipt:
- Check that all expected columns are present
- Check the number of missing values
- Pay particular attention to extreme values
- Verify that geocoded data is located in the French territory
- Etc.
Finally, if doubts remain about the veracity of the content, in cases where the geographical location of an element is retrieved, you can check their actual presence on a sample by projecting the data onto a satellite map, for example. But there are cases where the information is simply not verifiable (e.g., the number of births in a municipality); in such cases, only the source can guarantee the truth.
Once these various steps have been validated, it becomes feasible to generate indicators and integrate diverse sources to extract a wealth of information from this valuable data. Additionally, when we speak of “extracting as much information as possible,” it is essential to consider how it is presented, as this is a critical aspect. There are various methods of Data Visualization, including the creation of Dashboards. However, in this case, creating maps may be the most suitable approach, given that Open Data predominantly consists of geographical data.
Conclusion
Open Data is undeniably a vast and valuable source of information, but it must be effectively leveraged. It is a rapidly growing sector, and both Crédit Agricole and the DataLab Group have recognized this and have a vested interest in further exploring this data, given the significant economic, energy, and social implications involved. There will undoubtedly be numerous future projects centered around this data, as new use cases and sources continue to emerge. For instance, Météo France is expected to release their data by early 2024, including tabular data from various meteorological stations and possibly satellite images of the French territory.