You are browsing the archive for text and data mining.

This is what Europe can do to stimulate Text and Data Mining

- September 12, 2017 in FutureTDM, text and data mining

This press release has been reposted from the FutureTDM website Text and data mining – using algorithms to analyse content in ways that would be impossible for humans – is shaping up to be a vital research tool of the 21st century. But Europe lags behind other parts of the world in adopting these new technologies. The FutureTDM project has just concluded its’ two-year EC-funded research investigating what’s holding Europe back. The project consortium, consisting of 10 European partners led by SYNYO, met with stakeholders and experts from all over Europe, gathering input and carrying out research to understand how Europe can take steps to support the uptake of TDM. Open Knowledge International together with ContentMine led the work on communication, mobilisation and networking and undertook research into best practices and methodologies. The potential benefits – and risks – are huge. According to the project’s economic analysis, TDM technologies could have an impact of as much as USD 110 billion on the European economy by 2020. If Europe is not ready to foster and support the use of TDM, the risk is seeing talent and economic benefits go elsewhere. Legal barriers are a big problem. TDM processes often involve copying content for analysis, so applications of TDM may fall foul of copyright laws. The EU has a fragmented landscape of restrictive, often unclear laws that can restrict re-use of content for TDM. Skills and education in this area also need a boost. Data analysis is fast becoming “the new IT”, and people in all fields, from fashion to finance, could benefit from an education in fundamental data literacy and computational thinking skills. Lack of infrastructure and economic incentives are lesser concerns. More information on these barriers is available from the FutureTDM report Policies and Barriers of TDM in Europe. FutureTDM put together real, practical proposals to support the uptake of TDM in Europe. These are summarised in a Roadmap for the EU which focuses on three key phases of support:
  1. Content Availability: making sure content is legally and practically discoverable and re-usable for TDM. Since rights clearance can be practically impossible for many TDM applications, it almost certainly means copyright reform to allow re-use of content that doesn’t trade on the original creative expression.
  2. Support Early Adopters: there is a need for initiatives that will connect TDM practitioners across domains and sectors, helping them share best practices and learn from each other’s experiences.
  3. The Next Generation: it is important to build a ‘data-savvy’ culture, where all Europeans have a fundamental awareness of the potential uses and benefits of data analytics.
The platform at brings together all the results of the FutureTDM project. As well as databases of TDM projects, experts, methods and tools, the Knowledge Base includes a series of practical guidelines for stakeholders in the TDM landscape. These are resources offering straightforward, plainly-worded advice on legal, licensing, and data management issues – as well as on how universities in particular can play a key role in supporting the uptake of TDM in Europe. All outcomes are also summarised in the awareness sheet Outcomes of FutureTDM.     The FutureTDM project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No 665940.
For further questions please contact: / Tel +43 1 9962011

FutureTDM: The Future of Text and Data Mining

- March 3, 2017 in FutureTDM, OKI Projects, text and data mining

Blog written by Freyja van den Boom (FutureTDM researcher) and Lieke Ploeger. Since September 2015 Open Knowledge International has been working on finding new ways to improve the uptake of text and data mining in the EU, as part of the FutureTDM project. Text and data mining (TDM) is the process of extracting relevant information from large amounts of machine-readable data (such as scientific papers) and recombining this to unlock new knowledge and power innovation (see ‘Techniques, Tools & Technologies for TDM in Europe’). Project partners include libraries, publishers and universities, but also the non-profit organisation ContentMine that advocates for the right to mine content. Open Knowledge International leads the work on communication, mobilisation and networking and undertakes the research into best practices and methodologies. A practical example explaining the use of TDM

Because the use of TDM is significantly lower in Europe than in some countries in the Americas and Asia, FutureTDM actively engages with stakeholders in the EU such as researchers, developers, publishers and SMEs to help pinpoint why uptake is lower, raise awareness of TDM and develop solutions. This is especially important at this current time, because an exception for TDM under copyright law is discussed on a European level. Such an exception will make copyright law less restrictive for TDM carried out under certain circumstances.

Throughout 2016 we organised Knowledge Cafe’s across Europe as an informal opportunity to gather feedback on text and data mining from researchers, developers, publishers, SMEs and any other stakeholder groups working in the field and held stakeholder consultations with the various communities.  In September 2016 we held the first of two workshops to discuss the project’s findings in Brussels where many MEPs and policymakers were present. In early 2017 a roundtable was organised at the Computer Privacy and Data Protection (CPDP) conference in Brussels, where the impact of data protection regulations for the uptake of advanced data analysis technologies like TDM was discussed.

MEP Julia Reda discussing the upcoming copyright reform at the FutureTDM workshop

Below are some of the insights we have gained through are research so far, which include the main barriers for different TDM stakeholder communities. In the upcoming months we will be publishing more of the results and proposed solutions on how to overcome them.

Education and skill
There is a need for more education on the benefits and practical use of TDM for researchers: working together with industry, publishing community and academia to develop effective courses aimed at different levels depending on the discipline and type of research that is likely to use TDM. We are currently working on TDM education and looking to get feedback on what the learning outcomes should be. If you are interested to get involved contact us !

Legal and policy
There is no legal clarity about the legal status of TDM practices and use of results that are gained through using TDM. Barriers include the uncertainty about the scope of copyright, database protection and privacy and data protection regulations. See for example our guest blog here.
The current copyright reform discussions focuses partly on a TDM exception which could help provide more clarity. Under discussion is for example what data and what usefalls under copyright, for example whether there should be a distinction between commercial and non-commercial use. FutureTDM partners are monitoring these developments.

We have recently published the FutureTDM policy framework introducing high level principles that should be the foundation of every stakeholder action that aims to promote TDM. These high level principles are:
  • Awareness and Clarity: actions should improve certainty on the use of TDM technologies. Information and clear actions are crucial for a flourishing TDM environment in Europe.
  • TDM without Boundaries: insofar as appropriate, boundaries should be cleared to prevent and take away fragmentation in the TDM landscape.
  • Equitable Access: access to TDM tools and technologies, as well as sources (such as datasets), are indispensable for a successful uptake of TDM, but usually comes at a price. While a broadest possible access to tools and data should be the aspiration, providers of these also have a legitimate interest in restricting access, for example for the protection of their investments or any privacy related interest.
Technical and infrastructure
The main concern is access and quality of available data. There is a confidence in the technological developments of more reliable and easy to use tools and services, although the documentation and findability of relevant tools and services is reported as a barrier at the moment. Developing standards for data quality is seen as a useful but most likely impossible solution given the diversity in projects and requirements, which would make standards too complex for compliance. Economy and Incentives
Barriers that are mentioned are the lack of a single European market, the problems of having multiple languages and a lack of enforcement for US companies. Further research
The interviews and the case studies have provided evidence of and insight into the barriers that exist in Europe. To what extent these barriers can be solved given the different interests of the stakeholders involved remains a topic for further research within the FutureTDM project. We will continue to work on recommendations, guidelines and best practices for improving the uptake of TDM in Europe, focused on addressing the barriers presented by the main stakeholders. All findings, which include policy recommendations, guidelines, case studies, best practices, practical tutorials and help and how to guides to increase TDM uptake are shared through the platform at The FutureTDM awareness sheets for example cover a range of factors that have an impact on TDM uptake and were created from our expert reports, expert interviews and discussions through our Knowledge Café events. The reports that have been completed so far are available from the Knowledge Library. In the final six months of the FutureTDM project, there are many opportunities to find out more about the results and give your feedback on the situation around TDM in Europe. On 29 March, the second FutureTDM workshop at the European Parliament in Brussels  will take place, where your input on TDM experiences on the ground is very welcome. With EU copyright reform now in progress, we bring together policy makers and stakeholder groups so that we can share FutureTDM’s  findings and our first expert driven policy recommendations that can help increase EU TDM. To find out more and sign up, please check the event page. We will showcase the final project results during the final FutureTDM symposium, organised in conjunction with the International Data Science Conference (12-13 June 2017, Salzburg, Austria. Our animation explaining TDM and the importance of stakeholder engagement < p style="text-align: left;">