You are browsing the archive for FutureTDM.

This is what Europe can do to stimulate Text and Data Mining

- September 12, 2017 in FutureTDM, text and data mining

This press release has been reposted from the FutureTDM website Text and data mining – using algorithms to analyse content in ways that would be impossible for humans – is shaping up to be a vital research tool of the 21st century. But Europe lags behind other parts of the world in adopting these new technologies. The FutureTDM project has just concluded its’ two-year EC-funded research investigating what’s holding Europe back. The project consortium, consisting of 10 European partners led by SYNYO, met with stakeholders and experts from all over Europe, gathering input and carrying out research to understand how Europe can take steps to support the uptake of TDM. Open Knowledge International together with ContentMine led the work on communication, mobilisation and networking and undertook research into best practices and methodologies. The potential benefits – and risks – are huge. According to the project’s economic analysis, TDM technologies could have an impact of as much as USD 110 billion on the European economy by 2020. If Europe is not ready to foster and support the use of TDM, the risk is seeing talent and economic benefits go elsewhere. Legal barriers are a big problem. TDM processes often involve copying content for analysis, so applications of TDM may fall foul of copyright laws. The EU has a fragmented landscape of restrictive, often unclear laws that can restrict re-use of content for TDM. Skills and education in this area also need a boost. Data analysis is fast becoming “the new IT”, and people in all fields, from fashion to finance, could benefit from an education in fundamental data literacy and computational thinking skills. Lack of infrastructure and economic incentives are lesser concerns. More information on these barriers is available from the FutureTDM report Policies and Barriers of TDM in Europe. FutureTDM put together real, practical proposals to support the uptake of TDM in Europe. These are summarised in a Roadmap for the EU which focuses on three key phases of support:
  1. Content Availability: making sure content is legally and practically discoverable and re-usable for TDM. Since rights clearance can be practically impossible for many TDM applications, it almost certainly means copyright reform to allow re-use of content that doesn’t trade on the original creative expression.
  2. Support Early Adopters: there is a need for initiatives that will connect TDM practitioners across domains and sectors, helping them share best practices and learn from each other’s experiences.
  3. The Next Generation: it is important to build a ‘data-savvy’ culture, where all Europeans have a fundamental awareness of the potential uses and benefits of data analytics.
The platform at brings together all the results of the FutureTDM project. As well as databases of TDM projects, experts, methods and tools, the Knowledge Base includes a series of practical guidelines for stakeholders in the TDM landscape. These are resources offering straightforward, plainly-worded advice on legal, licensing, and data management issues – as well as on how universities in particular can play a key role in supporting the uptake of TDM in Europe. All outcomes are also summarised in the awareness sheet Outcomes of FutureTDM.     The FutureTDM project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No 665940.
For further questions please contact: / Tel +43 1 9962011

FutureTDM symposium: sharing project findings, policy guidelines and practitioner recommendations

- July 13, 2017 in FutureTDM

The FutureTDM project, in which Open Knowledge International participates, actively engages with stakeholders in the EU such as researchers, developers, publishers and SMEs to help improve the uptake of text and data mining (TDM) in Europe (read more). Last month, we held our FutureTDM Symposium at the International Data Science Conference 2017 in Salzburg, Austria. With the project drawing to a close, we shared the project findings and our first expert driven policy recommendations and practitioner guidelines. This blog report has been adapted from the original version on the FutureTDM blog. The FutureTDM track at the International Data Science Conference 2017 started with a speech by Bernhard Jäger form SYNYO who did a brief introduction to the project and explained the purpose of the Symposium – bringing together policy makers and stakeholder groups to share with them FutureTDM’s findings on how to increase TDM uptake. This was followed by a keynote speech on the Economic Potential of Data Analytics by Jan Strycharz from Fundacja Projekt Polska, a FutureTDM project partner. It was estimated that automated (big) data and analytics – if developed properly – will bring over 200 B Euro to the European GDP by 2020. This means that algorithms (not to say robots) will be, then, responsible for 1.9% of the European GDP. You can read more on the TDM impact on economy in our report Trend analysis, future applications and economics of TDM.

Dealing with the legal bumps

The plenary session with keynote speeches was followed by the panel: Data Analytics and the Legal Landscape: Intellectual Property and Data Protection. As an introduction to this legal session Freyja van den Boom from Open Knowledge International presented our findings on the legal barriers to TDM uptake that mainly refer to type of content and applicable regime (IP or Data Protection). Having gathered evidence from the TDM community, FutureTDM has identified three types of barriers: uncertainty, fragmentation and restrictiveness and developed guidelines recommendation how to overcome them. We have summarised this in our awareness sheet Legal Barriers and Recommendations. This was followed by the statements from the panelists: Prodromos Tsiavos (Onassis Cultural Centre/ IP Advisor) stressed the fact that with the recent changes in the European framework, the law faces significant issues and balancing the industrial interest is becoming necessary. He added that in order to initiate the uptake of the industry, a different approach is certainly needed because the industry will continue with license arrangements. Duncan Campbell (John Wiley & Sons, Inc.) concentrated on Copyright and IP issues. How do we deal with all the knowledge created? How does the copyright rule has influence? He spoke about EU Commission Proposal and UK TDM exception – how to make an exception work? Marie Timmermann (Science Europe) also focused on the TDM exception and its positive and negative sides. From the positive perspective, she views the fact that TDM exception moved from being optional to mandatory and it is not overridable. From the negative side she stated that the exception is very limited in scope. Startups or SMEs do not fall under this exception. Thus, Europe risks to lose promising researchers to other parts of the world. This statement was also supported by Romy Sigl (AustrianStartups). She confirmed that anybody can created a startup today, but if startups are not supported by legislation, they move outwards to another country where more potential is foreseen.

The right to read is to right to mine

The next panel was devoted to an overview of FutureTDM case studies: Startups to Multinationals. Freyja van den Boom (OKI) gave on overview of the highlights of the stakeholder consultations, which cover different areas and stakeholder groups within TDM domain. Peter Murray-Rust (ContentMine) presented a researcher’s view and he stressed that the right to read is to right to mine, but we have no legal certainty what a researcher is allowed to do and what not. Petr Knoth from CORE added that he believed that we needed the data infrastructure to support TDM. Data scientist are very busy with cleaning the data and they have little time to do the real mining. He added that the infrastructure should not be operated by the publishers but they should provide support. Donat Agosti from PLAZI focused on how you can make the data accessible so that everybody can use it. He mentioned the case of PLAZI repository – TreatmentBank. It is open and extracts each article and creates citable data. Once you have the data you can disseminate it. Kim Nilsson from PIVIGO spoke about the support for academics – they have already worked with 70 companies and provided support in TDM for 400 PhD academics. She mentioned how important data analytics and the possibility to see all the connections and correlations are for example for the medical sector. She stressed that data analytics is also extremely important for startups – gaining the access is critical for them.

Data science is the new IT

The next panel was devoted to Universities, TDM and the need for strategic thinking on educating researchers. FutureTDM project officer Kiera McNeice (British Library) gave an overview on the skills and education barriers to TDM. She stressed that there are many people saying that they need to have quite a lot of knowledge to use TDM and that there are skills gap between academia and industry. Also, the barriers to enter are still high because use of the TDM tools often require programming knowledge. We have put together a series of guidelines to help stakeholders overcome the barriers we have identified. Our policy guidelines include encouraging universities to support TDM through both their research and education arm for example by helping university senior management understand the needs of researchers around TDM, and potential benefits of supporting it. You can read more in our Baseline report of policies and barriers of TDM in Europe, or walk through them via our Knowledge Base. Kim Nilsson from PIVIGO stressed that the main challenge are software skills. The fact is that if you can do TDM you have fantastic options: startups, healthcare, charity. Their task is to offer proper career advice, help people understand what kind of skills are appreciated and assist them to build on them. Claire Sewell (Cambridge University Library) elaborated on the skills from the perspective of an academic librarian. What important is the basic understanding on copyright law, keeping up with technical skills and data skills. “We want to make sure that if a researcher comes into the library we are able to help him.”- she concluded. Jonas Holm from Stockholm University Library highlighted the fact that very little strategical thinking is going on in TDM area. “We have struggled to find much strategical thinking on TDM area. Who is strategically looking for improving the uptake at the universities? We couldn’t find much around Europe” – he said. Stefan Kasberger (ContentMine) stressed that the social part of the education is also important – meaning inclusion and diversity.

Infrastructure for Technology Implementation

The last session was dedicated to technologies and infrastructures supporting Text and Data Analytics: challenges and solutions. FutureTDM Project Officer Maria Eskevich (Radboud University) delivered a presentation on the TDM landscape with respect to infrastructure for technical implementation. Stelios Piperidis from OpenMinTed stressed the need for an infrastructure. “Following more on what we have discussed, it looks that TDM infrastructure has to respond to 3 key questions: How can I get hold on the data that I need? How can I find the tool to mine the data? How can I deploy the work carried out?” Mihai Lupu form Data market Austria brought up the issue of data formats: For example, there is a lot of data in csv files that people don’t know how to deal with. Maria Gavrilidou (clarin:el) highlighted the fact that not only the formats are problem but also identifying the source of data and putting in place lawful procedures with respect to this data. Meta data is also problematic because it very often does not exist. Nelson Silva (know-centre) focused on using proper tools for mining the data. Very often there is no particular tool that meets your needs and you have to either develop one or search for open source tools. Another challenge is the quality of the data. How much can you rely on the data and how to visualise it? And finally, how to be sure that the people will have the right message.


The closing session was conducted by Kiera McNeice (British Library), who presented A Roadmap to promoting greater uptake of Data Analytics in Europe.  Finally, we also had a Demo Session with flash presentations by:
  • Stefan Kasberger (ContentMine),
  • Donat Agosti (PLAZI), Petr Knoth (CORE),
  • John Thompson-Ralf Klinkenberg (Rapidminer),
  • Maria Gavrilidou (clarin:el),
  • Alessio Palmero Aprosio (ALCIDE)
You can find all FutureTDM reports in our Knowledge Library, or visit our Knowledge Base: a structured collection of resources on Text and Data Mining (TDM) that has been gathered throughout the FutureTDM project.  

FutureTDM: The Future of Text and Data Mining

- March 3, 2017 in FutureTDM, OKI Projects, text and data mining

Blog written by Freyja van den Boom (FutureTDM researcher) and Lieke Ploeger. Since September 2015 Open Knowledge International has been working on finding new ways to improve the uptake of text and data mining in the EU, as part of the FutureTDM project. Text and data mining (TDM) is the process of extracting relevant information from large amounts of machine-readable data (such as scientific papers) and recombining this to unlock new knowledge and power innovation (see ‘Techniques, Tools & Technologies for TDM in Europe’). Project partners include libraries, publishers and universities, but also the non-profit organisation ContentMine that advocates for the right to mine content. Open Knowledge International leads the work on communication, mobilisation and networking and undertakes the research into best practices and methodologies. A practical example explaining the use of TDM

Because the use of TDM is significantly lower in Europe than in some countries in the Americas and Asia, FutureTDM actively engages with stakeholders in the EU such as researchers, developers, publishers and SMEs to help pinpoint why uptake is lower, raise awareness of TDM and develop solutions. This is especially important at this current time, because an exception for TDM under copyright law is discussed on a European level. Such an exception will make copyright law less restrictive for TDM carried out under certain circumstances.

Throughout 2016 we organised Knowledge Cafe’s across Europe as an informal opportunity to gather feedback on text and data mining from researchers, developers, publishers, SMEs and any other stakeholder groups working in the field and held stakeholder consultations with the various communities.  In September 2016 we held the first of two workshops to discuss the project’s findings in Brussels where many MEPs and policymakers were present. In early 2017 a roundtable was organised at the Computer Privacy and Data Protection (CPDP) conference in Brussels, where the impact of data protection regulations for the uptake of advanced data analysis technologies like TDM was discussed.

MEP Julia Reda discussing the upcoming copyright reform at the FutureTDM workshop

Below are some of the insights we have gained through are research so far, which include the main barriers for different TDM stakeholder communities. In the upcoming months we will be publishing more of the results and proposed solutions on how to overcome them.

Education and skill
There is a need for more education on the benefits and practical use of TDM for researchers: working together with industry, publishing community and academia to develop effective courses aimed at different levels depending on the discipline and type of research that is likely to use TDM. We are currently working on TDM education and looking to get feedback on what the learning outcomes should be. If you are interested to get involved contact us !

Legal and policy
There is no legal clarity about the legal status of TDM practices and use of results that are gained through using TDM. Barriers include the uncertainty about the scope of copyright, database protection and privacy and data protection regulations. See for example our guest blog here.
The current copyright reform discussions focuses partly on a TDM exception which could help provide more clarity. Under discussion is for example what data and what usefalls under copyright, for example whether there should be a distinction between commercial and non-commercial use. FutureTDM partners are monitoring these developments.

We have recently published the FutureTDM policy framework introducing high level principles that should be the foundation of every stakeholder action that aims to promote TDM. These high level principles are:
  • Awareness and Clarity: actions should improve certainty on the use of TDM technologies. Information and clear actions are crucial for a flourishing TDM environment in Europe.
  • TDM without Boundaries: insofar as appropriate, boundaries should be cleared to prevent and take away fragmentation in the TDM landscape.
  • Equitable Access: access to TDM tools and technologies, as well as sources (such as datasets), are indispensable for a successful uptake of TDM, but usually comes at a price. While a broadest possible access to tools and data should be the aspiration, providers of these also have a legitimate interest in restricting access, for example for the protection of their investments or any privacy related interest.
Technical and infrastructure
The main concern is access and quality of available data. There is a confidence in the technological developments of more reliable and easy to use tools and services, although the documentation and findability of relevant tools and services is reported as a barrier at the moment. Developing standards for data quality is seen as a useful but most likely impossible solution given the diversity in projects and requirements, which would make standards too complex for compliance. Economy and Incentives
Barriers that are mentioned are the lack of a single European market, the problems of having multiple languages and a lack of enforcement for US companies. Further research
The interviews and the case studies have provided evidence of and insight into the barriers that exist in Europe. To what extent these barriers can be solved given the different interests of the stakeholders involved remains a topic for further research within the FutureTDM project. We will continue to work on recommendations, guidelines and best practices for improving the uptake of TDM in Europe, focused on addressing the barriers presented by the main stakeholders. All findings, which include policy recommendations, guidelines, case studies, best practices, practical tutorials and help and how to guides to increase TDM uptake are shared through the platform at The FutureTDM awareness sheets for example cover a range of factors that have an impact on TDM uptake and were created from our expert reports, expert interviews and discussions through our Knowledge Café events. The reports that have been completed so far are available from the Knowledge Library. In the final six months of the FutureTDM project, there are many opportunities to find out more about the results and give your feedback on the situation around TDM in Europe. On 29 March, the second FutureTDM workshop at the European Parliament in Brussels  will take place, where your input on TDM experiences on the ground is very welcome. With EU copyright reform now in progress, we bring together policy makers and stakeholder groups so that we can share FutureTDM’s  findings and our first expert driven policy recommendations that can help increase EU TDM. To find out more and sign up, please check the event page. We will showcase the final project results during the final FutureTDM symposium, organised in conjunction with the International Data Science Conference (12-13 June 2017, Salzburg, Austria. Our animation explaining TDM and the importance of stakeholder engagement < p style="text-align: left;">