You are browsing the archive for Steven De Costa.

Introducing Datashades.info, a CKAN Community Service

- September 23, 2019 in ckan

Do you use CKAN to power an open data portal? In this guest post Link Digital explains how you can take advantage of their latest open data initiative Datashades.info.

Datashades.info is a tool designed to deliver insights for researchers, portal managers, and the wider tech community to inform and support open data efforts relating to data hosted on CKAN platforms. Link Digital created the online service through a number of alpha releases and considers datashades.info, now in beta, as a long term initiative they expect to improve with more features in future releases. Specifically, Datashades.info provides a publicly-accessible index of metadata and statistics on CKAN data portals across the globe. For each portal, a number of statistics are aggregated and presented surrounding number of datasets, users, organisations and dataset tags. These statistics give portal managers the ability to quickly compare the size and scope of CKAN data portals to help inform their development roadmaps. Moreover, for each portal, installed plugin information is collected along with the relative penetration of those plugins across all portals in the index. This will enable CKAN developers to quickly see what extensions are the most popular and on what portals they are being used. Finally, all historical data is persisted and kept publically accessible, allowing researchers to analyse historical data trends in any indexed CKAN portal. Datashades.info was built to support a crowd-sourced indexing scheme. If a visitor searches for a CKAN portal and it is not found within the index, the system will immediately query that portal and attempt to generate a new index entry on-the-fly. Aggregation of a new portal’s statistics into Datashades.info also happens automatically. Maximise the tool and gain interesting information with the following features:

Globally Accessible open data

With Datashades.info, you can easily access an index of metadata and statistics on CKAN data portals across the globe. To do this, simply type in the portal’s URL on the homepage then click “Search“.

Integrated Values of All Metrics

After entering a portal’s URL, Datashades.info will load its information. After a few seconds, you will be able to see a range of data on portal users, datasets, resources, organisations, tags and plugins. Portal managers can access these via the individual portal page found on the site.

Easily-tracked Historical Data

Want to revisit data you previously explored? The tool also keeps old data in a historical index which users can explore any time on any portal page or by clicking “View All Data Portals” on the homepage.

Crowdsourcing

Datashades.info uses crowdsourcing to build its index. This means users can easily add any CKAN data portal not found on the site. To do this, simply search for a portal you know and it’ll be automatically added to the site and global statistics. As the project remains at a beta level of maturity, it is still wanting of improvements in many areas. But with the continuous feedback coming from the CKAN community, expect that more data and features will be added in future releases. For now, have a look around and stay tuned!  

Link Digital’s Enterprise CKAN Stack for AWS is Now Available on GitHub

- October 13, 2016 in Deployments, Featured, hosting, partners

As part of the commitment made at the White House Open Data Roundtable, Datashades, also trading as Link Digital, has recently released the preview of an Enterprise CKAN Stack for AWS.

The stack presents Link Digital’s best practice, with independently scalable layers, easily adapted to CI workflows and automated system maintenance. It is now freely available to use on our Datashades GitHub repository.

This OpsWorks stack has been in active use by Link Digital and presents a basis on which Link Digital builds and supports its Government Open Data platforms. Hence, the project can justly be called “eating your own dog food”.

Even now that there is a number of improvements in progress, we believe that the newly-published alpha version of the project will add value to the Public Data community.

To build an OpsWorks stack you will need these CloudFormation templates.
When entering parameters for the CloudFormation template you will need the following cookbook URL for the OpsWorks stack.

Steven De Costa at the IODC CKAN Booth

Steven De Costa at the IODC CKAN Booth

A longer monologue from a dev list discussion:

Attaching our high level architecture using RDS on AWS — for UAT and PROD: appendix_8_updated_aws-hosting-environment-2.

CloudFormation scripts for building out CKAN in a HA config can be found at https://github.com/DataShades/ckan-aws-templates

OpWorks version is here: https://github.com/DataShades/opswx-ckan-cookbook

Happy to collaborate on this and make it shine brighter :)

There are a few other relevant scripts under our datashades set of repos, such as the ASG one here: https://github.com/DataShades/updateasg

And, the general cloud storage one here: https://github.com/DataShades/ckanext-cloudstorage

And the S3 related one here: https://github.com/DataShades/ckanext-s3filestore

We’ve also improved the SSO approach with Saml2: https://github.com/DataShades/ckanext-saml2

And, begun some work for manipulating ACLs, which is important for private dataset resources you’d want to switch to ‘public’ when published: https://github.com/DataShades/ckanext-acl

Although not formally part of the CKAN roadmap I have a working model of where I’d like CKAN to head when it comes to enterprise file/data storage and access. If you are familiar with the concept of resource views then the idea I’m keen to pursue is similar. It is a concept of resource containers (not para-virtualization containers but storage or access point containers). The idea is to make CKAN extendable via extensions of a type that allow it to do more orchestration around how data is stored and made usable below the discovery layer of the metadata.

The story would be something like:
As a platform operator, I need to be able to configure a variety of storage and access endpoint possibilities, so that custodians can select where data is placed based on type of data or business need.

Resource container extensions would then be built to accommodate things like:

  1. Big data, transnational data feeds
  2. Semantic lakes
  3. Large file storage blobs
  4. Self declarative structured data (likely using data packaging/frictionless data)
  5. For cost auditing and accountability – storage into specified paid cloud accounts (different AWS, Azure, etc. accounts based on organisation)
I would image that resource view and resource container extensions would be paired in many cases to allow for the view to provide greater access and control of the data to provide an ability to query and extract insights from the data. The European Data Portal has around 650k datasets. It is true that once a CKAN portal gets to such a size then it can be a chore to do anything over the entire set of data in quick time. However, with the entire catalog readable via API there is a place for other tools to come into the picture to provide meta analysis or broader views over all data in a portal. CKAN’s structure allows for data ownership and custodianship to remain flexible as the governing entities change over time. If we keen those functions lightweight and build the more intensive data processing tasks within a resource container layer then I think that is the big win :) I see datastore and filestore as examples of resource containers. Datapusher is an example of an ETL that works with datastore but similar tools and concepts can be worked into the model and the open source goodness can grow organically to meet lots of different organisational needs. Where CKAN differs from other portal software, in my experience, is that it can be used for open Government data, research data, private sector data and ‘data as knowledge’ in virtually any situation. Other portal software appears to be built around capturing a particular market opportunity to generate data as knowledge for a particular customer segment – civic hackers, jurisdictional bureaucrats, open data policy implementations, etc. CKAN’s harvesting is good, but certainly not perfect. The approach for pushing from CKAN to elsewhere is likely to be used more in our future work, or as we refactor the architecture of current implementations. See: https://github.com/DataShades/ckanext-syndicate By using multiple CKAN environments it is pretty easy to have catalogs of ‘working data’ that then push to the ‘published data’ catalog. We use this approach for Government open data when from the bottom up you have agency data collected into CKAN based information asset registers. Sometimes the data doesn’t even exist, but the data management plan can at least first be registered prior to populating the dataset with resources. Once the data is ready it can then be published and syndicated upward to a higher level jurisdictional portal – such as a council, city, state or province. Similarly such datasets can then be syndicated upward again into a national or regional portal – perhaps with further ETL functions put in place to combine the similarly structured data from multiple agencies into a master dataset that presents a larger view of the entire data collection effort. If the domain of data collection differs, such as in a field of research, then the same architecture can still apply. Multiple research schools of chemistry, for example, could publish working data locally then syndicate upward into a global repository that allows for meta analysis of all research outcomes over the entire domain’s efforts. We’re working on a project in just this manner that is referenced here: http://linkdigital.com.au/news/2016/09/building-mdbox-an-open-access-simulation-data-repository-on-ckan-and-aws Lastly, published open data is the result of effort which is put into a process of data collection and, usually, some analysis and clean up. The tools used to process data, to prepare, collect or visulise are all part of the value a dataset represents. To bridge data and code we’ve released a very simple resource view for GitHub repositories that can be found here: https://github.com/DataShades/ckanext-githubrepopreview  Open Government initiatives are formed around principles of transparency, participation and collaboration. There is a desire to enable public-private collaboration over the long term and there is a role for Government to act as impresario to stimulate new markets and economic activity from publishing open data (ref: https://www.nesta.org.uk/sites/default/files/government_as_impresario.pdf). The reason we built the GitHub resource view is to encourage open source projects to emerge in connection to public datasets, via linking the opportunity for discovery of helpful code with the discovery of helpful datasets. Sorry for the long monologue! I could have more succinctly just said CKAN rocks, check out all the open source goodness surrounding it and jump in :)

Registration Open for CKANCon and Call for Speakers Closing This Friday

- August 18, 2016 in Featured

We are less than two months from CKANCon 2016, an official pre-event of this year’s International Open Data Conference, taking place in Madrid October 4!
As our community continues to grow rapidly, CKANCon will be a great opportunity to learn more about what others are doing with CKAN, and how you can use it in your organization.
We’ve had significant interest in speaking opportunities for this year’s event which is wonderful to see! Many speaking applications have come in the past few days so we are extending the deadline for speaker requests to this Friday, August 19. If you are interested in speaking, please fill out the CKANCon Speaker Request form before 12:00 a.m. EST this Friday.
Finally, we are happy to announce that registration for the event is now open! You can register today for both in person and online participation!
Looking forward to seeing everyone soon!

Open Knowledge Australia Board Meeting 29 September 2015

- October 1, 2015 in Featured, News

Minutes will be available once approved.

Open Knowledge Australia Board Meeting 29 September 2015

- October 1, 2015 in Featured, News

Minutes will be available once approved.

Open Knowledge Australia Board Meeting 29 September 2015

- October 1, 2015 in Featured, News

Minutes will be available once approved.

Open Knowledge Australia Board Meeting 29 September 2015

- October 1, 2015 in Featured, News

Minutes will be available once approved.

Pyramids, Pipelines and a Can-of-Sweave – CKAN Asia-Pacific Meetup

- September 18, 2015 in åben data, internationalt, Open Data Index

Florian Mayer from the Western Australian Department of Parks and Wildlife presents various methods he is using to create Wisdom.

Data+Code = Information;
Information + Context = Wisdom

So, can this be done with workbooks, applications and active documents? As Florian might say, “Yes it CKAN”!

Grab the code and materials related to the work from here:
http://catalogue.alpha.data.wa.gov.au/dataset/data-wa-gov-au

asia-pacificThis presentation was given at the first Asia-Pacific CKAN meetup on the 17th of September, hosted at Link Digital, as an initiative of the CKAN Community and Communications team. You can join the meetup and come along to these fortnightly sessions via video conference.

If you have some interesting content to present then please get in touch with @starl3n to schedule a session.

Pyramids, Pipelines and a Can-of-Sweave – CKAN Asia-Pacific Meetup

- September 18, 2015 in community, Featured, Presentations

Florian Mayer from the Western Australian Department of Parks and Wildlife presents various methods he is using to create Wisdom.

Data+Code = Information;
Information + Context = Wisdom

So, can this be done with workbooks, applications and active documents? As Florian might say, “Yes it CKAN”!

Grab the code and materials related to the work from here:
http://catalogue.alpha.data.wa.gov.au/dataset/data-wa-gov-au

asia-pacificThis presentation was given at the first Asia-Pacific CKAN meetup on the 17th of September, hosted at Link Digital, as an initiative of the CKAN Community and Communications team. You can join the meetup and come along to these fortnightly sessions via video conference.

If you have some interesting content to present then please get in touch with @starl3n to schedule a session.

Pyramids, Pipelines and a Can-of-Sweave – CKAN Asia-Pacific Meetup

- September 18, 2015 in community, Featured, Presentations

Florian Mayer from the Western Australian Department of Parks and Wildlife presents various methods he is using to create Wisdom.

Data+Code = Information; Information + Context = Wisdom
So, can this be done with workbooks, applications and active documents? As Florian might say, “Yes it CKAN”!

Grab the code and materials related to the work from here: http://catalogue.alpha.data.wa.gov.au/dataset/data-wa-gov-au

asia-pacificThis presentation was given at the first Asia-Pacific CKAN meetup on the 17th of September, hosted at Link Digital, as an initiative of the CKAN Community and Communications team. You can join the meetup and come along to these fortnightly sessions via video conference.

If you have some interesting content to present then please get in touch with @starl3n to schedule a session.