You are browsing the archive for Steven De Costa.

Link Digital’s Enterprise CKAN Stack for AWS is Now Available on GitHub

- October 13, 2016 in Deployments, Featured, hosting, partners

As part of the commitment made at the White House Open Data Roundtable, Datashades, also trading as Link Digital, has recently released the preview of an Enterprise CKAN Stack for AWS.

The stack presents Link Digital’s best practice, with independently scalable layers, easily adapted to CI workflows and automated system maintenance. It is now freely available to use on our Datashades GitHub repository.

This OpsWorks stack has been in active use by Link Digital and presents a basis on which Link Digital builds and supports its Government Open Data platforms. Hence, the project can justly be called “eating your own dog food”.

Even now that there is a number of improvements in progress, we believe that the newly-published alpha version of the project will add value to the Public Data community.

To build an OpsWorks stack you will need these CloudFormation templates.
When entering parameters for the CloudFormation template you will need the following cookbook URL for the OpsWorks stack.

Steven De Costa at the IODC CKAN Booth

Steven De Costa at the IODC CKAN Booth

A longer monologue from a dev list discussion:

Attaching our high level architecture using RDS on AWS — for UAT and PROD: appendix_8_updated_aws-hosting-environment-2.

CloudFormation scripts for building out CKAN in a HA config can be found at https://github.com/DataShades/ckan-aws-templates

OpWorks version is here: https://github.com/DataShades/opswx-ckan-cookbook

Happy to collaborate on this and make it shine brighter :)

There are a few other relevant scripts under our datashades set of repos, such as the ASG one here: https://github.com/DataShades/updateasg

And, the general cloud storage one here: https://github.com/DataShades/ckanext-cloudstorage

And the S3 related one here: https://github.com/DataShades/ckanext-s3filestore

We’ve also improved the SSO approach with Saml2: https://github.com/DataShades/ckanext-saml2

And, begun some work for manipulating ACLs, which is important for private dataset resources you’d want to switch to ‘public’ when published: https://github.com/DataShades/ckanext-acl

Although not formally part of the CKAN roadmap I have a working model of where I’d like CKAN to head when it comes to enterprise file/data storage and access. If you are familiar with the concept of resource views then the idea I’m keen to pursue is similar. It is a concept of resource containers (not para-virtualization containers but storage or access point containers). The idea is to make CKAN extendable via extensions of a type that allow it to do more orchestration around how data is stored and made usable below the discovery layer of the metadata.

The story would be something like:
As a platform operator, I need to be able to configure a variety of storage and access endpoint possibilities, so that custodians can select where data is placed based on type of data or business need.

Resource container extensions would then be built to accommodate things like:

  1. Big data, transnational data feeds
  2. Semantic lakes
  3. Large file storage blobs
  4. Self declarative structured data (likely using data packaging/frictionless data)
  5. For cost auditing and accountability – storage into specified paid cloud accounts (different AWS, Azure, etc. accounts based on organisation)
I would image that resource view and resource container extensions would be paired in many cases to allow for the view to provide greater access and control of the data to provide an ability to query and extract insights from the data. The European Data Portal has around 650k datasets. It is true that once a CKAN portal gets to such a size then it can be a chore to do anything over the entire set of data in quick time. However, with the entire catalog readable via API there is a place for other tools to come into the picture to provide meta analysis or broader views over all data in a portal. CKAN’s structure allows for data ownership and custodianship to remain flexible as the governing entities change over time. If we keen those functions lightweight and build the more intensive data processing tasks within a resource container layer then I think that is the big win :) I see datastore and filestore as examples of resource containers. Datapusher is an example of an ETL that works with datastore but similar tools and concepts can be worked into the model and the open source goodness can grow organically to meet lots of different organisational needs. Where CKAN differs from other portal software, in my experience, is that it can be used for open Government data, research data, private sector data and ‘data as knowledge’ in virtually any situation. Other portal software appears to be built around capturing a particular market opportunity to generate data as knowledge for a particular customer segment – civic hackers, jurisdictional bureaucrats, open data policy implementations, etc. CKAN’s harvesting is good, but certainly not perfect. The approach for pushing from CKAN to elsewhere is likely to be used more in our future work, or as we refactor the architecture of current implementations. See: https://github.com/DataShades/ckanext-syndicate By using multiple CKAN environments it is pretty easy to have catalogs of ‘working data’ that then push to the ‘published data’ catalog. We use this approach for Government open data when from the bottom up you have agency data collected into CKAN based information asset registers. Sometimes the data doesn’t even exist, but the data management plan can at least first be registered prior to populating the dataset with resources. Once the data is ready it can then be published and syndicated upward to a higher level jurisdictional portal – such as a council, city, state or province. Similarly such datasets can then be syndicated upward again into a national or regional portal – perhaps with further ETL functions put in place to combine the similarly structured data from multiple agencies into a master dataset that presents a larger view of the entire data collection effort. If the domain of data collection differs, such as in a field of research, then the same architecture can still apply. Multiple research schools of chemistry, for example, could publish working data locally then syndicate upward into a global repository that allows for meta analysis of all research outcomes over the entire domain’s efforts. We’re working on a project in just this manner that is referenced here: http://linkdigital.com.au/news/2016/09/building-mdbox-an-open-access-simulation-data-repository-on-ckan-and-aws Lastly, published open data is the result of effort which is put into a process of data collection and, usually, some analysis and clean up. The tools used to process data, to prepare, collect or visulise are all part of the value a dataset represents. To bridge data and code we’ve released a very simple resource view for GitHub repositories that can be found here: https://github.com/DataShades/ckanext-githubrepopreview  Open Government initiatives are formed around principles of transparency, participation and collaboration. There is a desire to enable public-private collaboration over the long term and there is a role for Government to act as impresario to stimulate new markets and economic activity from publishing open data (ref: https://www.nesta.org.uk/sites/default/files/government_as_impresario.pdf). The reason we built the GitHub resource view is to encourage open source projects to emerge in connection to public datasets, via linking the opportunity for discovery of helpful code with the discovery of helpful datasets. Sorry for the long monologue! I could have more succinctly just said CKAN rocks, check out all the open source goodness surrounding it and jump in :)

Registration Open for CKANCon and Call for Speakers Closing This Friday

- August 18, 2016 in Featured

We are less than two months from CKANCon 2016, an official pre-event of this year’s International Open Data Conference, taking place in Madrid October 4!
As our community continues to grow rapidly, CKANCon will be a great opportunity to learn more about what others are doing with CKAN, and how you can use it in your organization.
We’ve had significant interest in speaking opportunities for this year’s event which is wonderful to see! Many speaking applications have come in the past few days so we are extending the deadline for speaker requests to this Friday, August 19. If you are interested in speaking, please fill out the CKANCon Speaker Request form before 12:00 a.m. EST this Friday.
Finally, we are happy to announce that registration for the event is now open! You can register today for both in person and online participation!
Looking forward to seeing everyone soon!

Open Knowledge Australia Board Meeting 29 September 2015

- October 1, 2015 in Featured, News

Minutes will be available once approved.

Open Knowledge Australia Board Meeting 29 September 2015

- October 1, 2015 in Featured, News

Minutes will be available once approved.

Open Knowledge Australia Board Meeting 29 September 2015

- October 1, 2015 in Featured, News

Minutes will be available once approved.

Open Knowledge Australia Board Meeting 29 September 2015

- October 1, 2015 in Featured, News

Minutes will be available once approved.

Pyramids, Pipelines and a Can-of-Sweave – CKAN Asia-Pacific Meetup

- September 18, 2015 in åben data, internationalt, Open Data Index

Florian Mayer from the Western Australian Department of Parks and Wildlife presents various methods he is using to create Wisdom.

Data+Code = Information;
Information + Context = Wisdom

So, can this be done with workbooks, applications and active documents? As Florian might say, “Yes it CKAN”!

Grab the code and materials related to the work from here:
http://catalogue.alpha.data.wa.gov.au/dataset/data-wa-gov-au

asia-pacificThis presentation was given at the first Asia-Pacific CKAN meetup on the 17th of September, hosted at Link Digital, as an initiative of the CKAN Community and Communications team. You can join the meetup and come along to these fortnightly sessions via video conference.

If you have some interesting content to present then please get in touch with @starl3n to schedule a session.

Pyramids, Pipelines and a Can-of-Sweave – CKAN Asia-Pacific Meetup

- September 18, 2015 in community, Featured, Presentations

Florian Mayer from the Western Australian Department of Parks and Wildlife presents various methods he is using to create Wisdom.

Data+Code = Information;
Information + Context = Wisdom

So, can this be done with workbooks, applications and active documents? As Florian might say, “Yes it CKAN”!

Grab the code and materials related to the work from here:
http://catalogue.alpha.data.wa.gov.au/dataset/data-wa-gov-au

asia-pacificThis presentation was given at the first Asia-Pacific CKAN meetup on the 17th of September, hosted at Link Digital, as an initiative of the CKAN Community and Communications team. You can join the meetup and come along to these fortnightly sessions via video conference.

If you have some interesting content to present then please get in touch with @starl3n to schedule a session.

Pyramids, Pipelines and a Can-of-Sweave – CKAN Asia-Pacific Meetup

- September 18, 2015 in community, Featured, Presentations

Florian Mayer from the Western Australian Department of Parks and Wildlife presents various methods he is using to create Wisdom.

Data+Code = Information; Information + Context = Wisdom
So, can this be done with workbooks, applications and active documents? As Florian might say, “Yes it CKAN”!

Grab the code and materials related to the work from here: http://catalogue.alpha.data.wa.gov.au/dataset/data-wa-gov-au

asia-pacificThis presentation was given at the first Asia-Pacific CKAN meetup on the 17th of September, hosted at Link Digital, as an initiative of the CKAN Community and Communications team. You can join the meetup and come along to these fortnightly sessions via video conference.

If you have some interesting content to present then please get in touch with @starl3n to schedule a session.

Walkthrough for 2015 Global Open Data Census Updates

- September 16, 2015 in Canberra, Featured, Government data

Last week at the Canberra meetup we were joined by Stephen Gates to discuss the Global Open Data Census. After organising a demo of the update process we’ve produced the following video to give everyone an easy to follow walkthrough on how to submit updates for this year. Please note that the deadline for updates is the 20th of September so we’d really appreciate some help over this week. The video also covers information about the state level and local government level indexes. Submissions for these can be received at any time and the process is just as easy as that of the global census. You can follow Australian developments around this project on Twitter.