You are browsing the archive for Technical.

Frictionless DarwinCore Tool by André Heughebaert

- December 9, 2019 in Frictionless Data, Open Knowledge, Open Research, Open Science, Open Software, Technical

This blog is part of a series showcasing projects developed during the 2019 Frictionless Data Tool Fund.  The 2019 Frictionless Data Tool Fund provided four mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.   logo

Frictionless DarwinCore, developed by André Heughebaert

  André Heughebaert is an open biodiversity data advocate in his work and his free time. He is an IT Software Engineer at the Belgian Biodiversity Platform and is also the Belgian GBIF (Global Biodiversity Information Facility) Node manager. During this time, he has worked with the Darwin Core Standards and Open Biodiversity data on a daily basis. This work inspired him to apply for the Tool Fund, where he has developed a tool to convert DarwinCore Archives into Frictionless Data Packages.   The DarwinCore Archive (DwCA) is a standardised container for biodiversity data and metadata largely used amongst the GBIF community, which consists of more than 1,500 institutions around the world. The DwCA is used to publish biodiversity data about observations, collections specimens, species checklists and sampling events. However, this domain specific standard has some limitations, mainly the star schema (core table + extensions), rules that are sometimes too permissive, and a lack of controlled vocabularies for certain terms. These limitations encouraged André to investigate emerging open data standards. In 2016, he discovered Frictionless Data and published his first data package on historical data from 1815 Napoleonic Campaign of Belgium. He was then encouraged to create a tool that would, in part, build a bridge between these two open data ecosystems.   As a result, the Frictionless DarwinCore tool converts DwCA into Frictionless Data Packages, and also gives access to the vast Frictionless Data software ecosystem enabling constraints validation and support of a fully relational data schema.  Technically speaking, the tool is implemented as a Python library, and is exposed as a Command Line Interface. The tool automatically converts: project architecture   * DwCA data schema into datapackage.json * EML metadata into human readable markdown readme file * data files are converted when necessary, this is when default values are described The resulting zip file complies to both DarwinCore and Frictionless specifications.    André hopes that bridging the two standards will give an excellent opportunity for the GBIF community to provide open biodiversity data to a wider audience. He says this is also a good opportunity to discover the Frictionless Data specifications and assess their applicability to the biodiversity domain. In fact, on 9th October 2019, André presented the tool at a GBIF Global Nodes meeting. It was perceived by the nodes managers community as an exploratory and pioneering work. While the command line interface offers a simple user interface for non-programmers, others might prefer the more flexible and sophisticated Python API. André encourages anyone working with DarwinCore data, including all data publishers and data users of GBIF network, to try out the new tool. 
“I’m quite optimistic that the project will feed the necessary reflection on the evolution of our biodiversity standards and data flows.”

To get started, installation of the tool is done through a single pip install command (full directions can be found in the project README). Central to the tool is a table of DarwinCore terms linking a Data Package type, format and constraints for every DwC term. The tool can be used as CLI directly from your terminal window or as Python Library for developers. The tool can work with either locally stored or online DwCA. Once converted to Tabular DataPackage, the DwC data can then be ingested and further processed by software such as Goodtables, OpenRefine or any other Frictionless Data software. André has aspirations to take the Frictionless DarwinCore tool further by encapsulating the tool in a web-service that will directly deliver Goodtables reports from a DwCA, which will make it even more user friendly. Additional ideas for further improvement would be including an import pathway for DarwinCore data into Open Refine, which is a popular tool in the GBIF community. André’s long term hope is that the Data Package will become an optional format for data download on GBIF.org.  workflow Further reading: Repository: https://github.com/frictionlessdata/FrictionlessDarwinCore Project blog: https://andrejjh.github.io/fdwc.github.io/

Announcing A New Architectural Roadmap for OpenSpending

- April 15, 2015 in #contribute, Technical

  At the 2015 Open Data Day a proposal for a new vision for the approach and architecture of OpenSpending was approved. It opens up an exciting opportunity for open budget initiatives around the world to work more closely together, whilst remaining independent. In a nutshell:   We want to centralize data but decentralize ‘presentation’ […]

Announcing A New Architectural Roadmap for OpenSpending

- April 15, 2015 in #contribute, Technical

  At the 2015 Open Data Day a proposal for a new vision for the approach and architecture of OpenSpending was approved. It opens up an exciting opportunity for open budget initiatives around the world to work more closely together, whilst remaining independent. In a nutshell:   We want to centralize data but decentralize ‘presentation’ […]

Presenting public finance just got easier

- March 20, 2015 in Technical, Updates

This blog post is cross-posted from the CKAN blog. CKAN 2.3 is out! The world-famous data handling software suite which powers data.gov, data.gov.uk and numerous other open data portals across the world has been significantly upgraded. How can this version open up new opportunities for existing and coming deployments? Read on. One of the new […]

Presenting public finance just got easier

- March 20, 2015 in Technical, Updates

This blog post is cross-posted from the CKAN blog. CKAN 2.3 is out! The world-famous data handling software suite which powers data.gov, data.gov.uk and numerous other open data portals across the world has been significantly upgraded. How can this version open up new opportunities for existing and coming deployments? Read on. One of the new […]

How to create a budget data package

- October 15, 2014 in Technical, tutorials

This tutorial will show you how to create a budget data package from a (relatively clean) spreadsheet dataset by walking you through the process of converting the Armenian budget from the Open Budgets Portal. Getting started The Armenia BOOST government expenditure database contains planned, adjusted, and executed expenditures covering the years 2006 to 2012. It […]

How to create a budget data package

- October 15, 2014 in Technical, tutorials

This tutorial will show you how to create a budget data package from a (relatively clean) spreadsheet dataset by walking you through the process of converting the Armenian budget from the Open Budgets Portal. Getting started The Armenia BOOST government expenditure database contains planned, adjusted, and executed expenditures covering the years 2006 to 2012. It […]

Community Sessions: Video Skillshare and Open Education

- June 9, 2014 in cameralibre, Events, Featured, OKFestival, Open Data, open-education, Technical, Video

Happy June! We have a few Community Sessions to announce. OKFestival is almost a month away. Videos are key for storytelling, so we are hosting a Video Skillshare to help us all learn. The Open Education Working Group will join us to talk about why open data matters in education. Join us for these two community sessions.
Take a Video: Preparing for OKFestival
cameras in baskets
Storytelling is key to building Open. Join Sam Muirhead of Cameralibre and the Open Knowledge team to learn some tips and tricks about video. We are preparing for OkFest and hope this skillshare helps everyone.
  • Date:Thursday, June 12, 2014
  • Time: 9:30 EDT/13:30 UTC/14:30 BST/15:30 CST
  • Our guest is Sam Muirhead.
  • Duration: 1 hour (This will be recorded)
  • Register
We’ll cover some topics like: What you need to think about before and during shooting to make sure footage is high quality and relevant, Hard-to-fix but easy-to-avoid mistakes, Tips and tricks for editing a simple interview or event video and some VERY basic technical guidelines eg. what settings to use for recording, exporting, etc. Sam was kind enough to share some resources:
Why Open Data matters to Education
Open Education is a very active global community. Join Marieke and Octavio to learn more about why open data matters to education. Also, learn about the many facets of open education and how to get involved. This session builds on the Make it Matter Workshop all about using Open methods in Education. See all previous Making it Matter workshopvideos. We’ll share all about open data in education, learn about the Open Education Working group and hear about work in Brazil and the UK. About Open Education
  • Date: Thursday, June 26, 2014
  • Time: 8:00 EDT / 12:00 UTC / 13:00 BST/14:00 CEST
  • Duration: 1 hour
  • Register
If you have a ideas for upcoming sessions, please ping heather DOT leson AT okfn DOT org. (Photo by Heather Leson, Venice Biennale. Art by Magdalena Campos-Pons)

Energy Buildings Performance Scenarios as Linked Open Data

- June 6, 2014 in GBPN, linked-open-data, OKF Austria, Semantic Web Company, Technical

This is a blog post by Martin Kaltenböck & Anne-Claire Bellec, cross-posted from the Semantic Puzzle Blog. Anne-Claire Bellec is Communications Manager at the Global Buildings Performance Network (http://www.gbpn.org), located at GBPNs headquarters in Paris, France, and Martin Kaltenböck is the responsible for web-based data tools at Semantic Web Company, a Linked Open Data specialised IT company located in Vienna, Austria as well as Member of the Board of the Austrian Chapter of Open Knowledge. The reduction of green house gas emissions is one of the big global challenges for the next decades. (Linked) Open Data on this multi-domain challenge is key for addressing the issues in policy, construction, energy efficiency, production a like. Today – on the World Environment Day 2014 – a new (linked open) data initiative contributes to this effort: GBPN’s Data Endpoint for Building Energy Performance Scenarios. Visualization GBPN (The Global Buildings Performance Network) provides the full data set on a recently made global scenario analysis for saving energy in the building sector worldwide, projected from 2005 to 2050. The multidimensional dataset includes parameters like housing types, building vintages and energy uses – for various climate zones and regions and is freely available for full use and re-use as open data under CC-BY 3.0 France license. To explore this easily, the Semantic Web Company has developed an interactive query / filtering tool which allows to create graphs and tables in slicing this multidimensional data cube. Chosen results can be exported as open data in the open formats: RDF and CSV and also queried via a provided SPARQL endpoint (a semantic web based data API). A built-in query-builder makes the use as well as the learning and understanding of SPARQL easy – for advanced users as well as also for non-experts or beginners. Visualization The LOD based information- & data system is part of Semantic Web Companies’ recent Poolparty Semantic Drupal developments and is based on OpenLinks Virtuoso 7 QuadStore holding and calculating ~235 million triples as well as it makes use of the RDF ETL Tool: UnifiedViews as well as D2R Server for RDF conversion. The underlying GBPNontology runs on PoolParty 4.2 and serves also a powerful domain-specific news aggregator realized with SWC’s sOnr webminer. Together with other Energy Efficiency related Linked Open Data Initiatives like REEEP, NREL, BPIE and others, GBPNs recent initative is a contribution towards a broader availability of data supporting action agains global warming – as also Dr. Peter Graham, Executive Director of GBPN emphasized “…data and modelling of building energy use has long been difficult or expensive to access – yet it is critical to policy development and investment in low-energy buildings. With the release of the BEPS open data model, GBPN are providing free access to the world’s best aggregated data analyses on building energy performance.” The Linked Open Data (LOD) is modelled using the RDF Data Cube Vocabulary (that is aW3C recommendation) including 17 dimensions in the cube. In total there are 235 million triples available in RDF including links to DBpedia and Geonames – linking the indicators: years – climate zones – regions and building types as well as user scenarios….

Meet OpenSpending version 0.13.0

- May 8, 2014 in Releases, Technical

This is going to be a slightly technical post (and has already been posted to the developer mailing list), but still it’s an important change so everyone is encouraged to read it. If you don’t understand something, then that’s just fine, it probably does not have anything to do with you and you can skip […]