You are browsing the archive for ckan.

Working with UNHCR to better collect, archive and re-use data about some of the world’s most vulnerable people

- January 7, 2022 in ckan, Interviews, News, OKI Projects, Open Knowledge, Open Knowledge Foundation

Since 2018, the team at Open Knowledge Foundation has been working with the Raw Internal Data Library (RIDL) project team at UNHCR to build an internal library of data to support evidence-based decision making by UNHCR and its partners.

What’s this about? 

The United Nations High Commissioner for Refugees (UNHCR) is a global organisation ‘dedicated to saving lives, protecting rights and building a better future for refugees, forcibly displaced communities and stateless people’.

Around the world, at least 82 million people have been forced to flee their homes. Many of these people are refugees and asylum seekers. Over half are internally displaced within the border of their own country. The vast majority of these people are hosted in developing countries. Learn more here.

UNHCR has a presence in 125 countries, with 90%+ of staff based in the field. An important dimension of their work involves collecting and using data – to understand what’s happening, to which people, where it’s happening and what should be done about it. 

In the past, managing this data has been a huge challenge. Data was collected in a decentralised manner. It was then stored, archived, and processed in a decentralised manner. This meant that much of the value of this data was lost. Insights were undiscovered. Opportunities missed. 

In 2019, the UNHCR released its Data Transformation Strategy 2020 – 2025 – with the vision of UNHCR becoming ‘a trusted leader on data and information related to refugees and other affected populations, thereby enabling actions that protect, include and empower’.

The Raw Internal Data Library (RIDL)  supports this strategy by creating a safe, organized place for UNHCR to store its data , with metadata that helps staff find the data they need and enables them to re-use it in multiple types of analysis. 

Since 2018, the team at Open Knowledge Foundation have been working with the RIDL team to build this library using CKAN –  the open source data management system. 

OKF spoke with Mariann Urban at UNHCR Global Data Service about the project to learn more. 

Here is an extract of that interview, which has been edited for length and clarity.


Hi Mariann. Can you start by telling us why data is important for UNHCR

MU/UNHCR: That’s a great question. Pretty much everyone at UNHCR now recognises that good data is the key to achieving meaningful solutions for displaced people. It’s important to enable evidence-based decision making and to deliver our mandate. And also, it helps us raise awareness and demonstrate the impact of our work. Data is at the foundation of what UNHCR does. It’s also important for building strong partnerships with governments and other organisations. When we share this data, anonymised where necessary, it allows our partners to design their programmes better. Data is critical to generate better knowledge and insights. Secondary usage includes indicator baseline analysis, trend analysis, forecasting, modeling etc. Data is really valuable!

What kinds of datasets does UNHCR collect and use?

MU/UNHCR: We have people working in countries all over the world, most of them in the field. Every year UNHCR spends a huge amount of money collecting data. It’s a huge investment. Much of this data collection happens at the field level, organised by our partners in operations. They collect a multitude of operational data each year.

You must have lots of interesting data. Can you give us an example of one important dataset?

MU/UNHCR: One of the most valuable datasets is our registration data. Registering refugees and asylum seekers is the primary responsibility of governments. But if they require help, UNHCR provides support in that area.

In the past, How was data collected, archived and used at UNHCR?

MU/UNHCR: Let me give you an example about how it used to be. In the past, let’s imagine, there was a data collection exercise in Cameroon. Our colleagues finished the exercise, and the data stayed in the partner organisation, or sometimes with the actual person collecting the data. It was stored on hard drives, shared drives, email accounts etc. Then, the next person who wanted to work with the data, or a similar data set probably had no access to this data, to use as a baseline, or for trends analysis.

That sounds like a problem.

MU/UNHCR: Yes! This was the problem statement that led to the idea of the Raw Internal Data Library (RIDL). Of course, we already have corporate data archiving solutions. But we realised we needed something more.

Tell us more about RIDL

MU/UNHCR: The main goal of RIDL is to stop data loss. We know that the organisation cannot capitalise on data if they are lost or forgotten, or not stored in a format that is interoperable, machine-readable, and does not include a minimum set of metadata to ensure appropriate further use.

RIDL is built on CKAN. Why is that?

MU/UNHCR: Our team had some experience with CKAN, which is already used in the humanitarian data community. UNHCR has been an active user of OCHA’s Humanitarian Data Exchange (HDX) platform to share aggregate data externally and we closely collaborate with its technical team. After a market research, we realised that CKAN was also a good solution for an internal library – the data is internal, but it needs to be visible to a lot of people inside the organisation. 

What about external partners and the media? Can they access RIDL datasets?

MU/UNHCR: There are some complicated issues around privacy and security. Some of the data we collect is extremely sensitive. We have to be strong custodians of this data to ensure it is used appropriately. Once we analyse the data, we can take the next step and share it externally, of course. Sometimes our data include personal identifiers, it therefore must be cleaned and anonymised to ensure that data subjects are not identifiable. Once we have a dataset that is anonymised – we use our Microdata Library to publish it externally. Thus RIDL is the first step in a long chain of sharing our data with partners, governments, researchers and the media. 

RIDL is a technological solution. But I imagine there is some cultural change required for UNHCR to reach its vision of becoming a data-enabled organisation.

MU/UNHCR: Yes of course, achieving these aspirations is not just about getting the technology right. We also have to make cultural, procedural and governance changes to become a data-enabled organisation. It’s a huge project. It needs a culture shift in UNHCR – because even if it’s internal, it’s a bit of work to convince people to upload. The metadata is always visible for everyone internally, but the actual data itself can be restricted and only visible following a request and evaluation. We want to be a trusted leader, but we also want to use that data to arrive at a better solution for refugees, to enrich our partnerships, and to enable evidence-based decision making – which is what we always aim to do.

Thanks for sharing your insights with us today Mariann. 

MU/UNHCR: No problem. It’s been a pleasure. 


Find out more

Open Knowledge Foundation is working with UNHCR to deliver the Raw Internal Data Library (RIDL). If you work outside of UNHCR, you can access UNHCR’s Microdata Library here. Learn more about CKAN here. 

If your organisation needs a Data Library solution and you want to learn more about our work, email info@okfn.org. We’d love to talk to you !

Introducing Datashades.info, a CKAN Community Service

- September 23, 2019 in ckan

Do you use CKAN to power an open data portal? In this guest post Link Digital explains how you can take advantage of their latest open data initiative Datashades.info.

Datashades.info is a tool designed to deliver insights for researchers, portal managers, and the wider tech community to inform and support open data efforts relating to data hosted on CKAN platforms. Link Digital created the online service through a number of alpha releases and considers datashades.info, now in beta, as a long term initiative they expect to improve with more features in future releases. Specifically, Datashades.info provides a publicly-accessible index of metadata and statistics on CKAN data portals across the globe. For each portal, a number of statistics are aggregated and presented surrounding number of datasets, users, organisations and dataset tags. These statistics give portal managers the ability to quickly compare the size and scope of CKAN data portals to help inform their development roadmaps. Moreover, for each portal, installed plugin information is collected along with the relative penetration of those plugins across all portals in the index. This will enable CKAN developers to quickly see what extensions are the most popular and on what portals they are being used. Finally, all historical data is persisted and kept publically accessible, allowing researchers to analyse historical data trends in any indexed CKAN portal. Datashades.info was built to support a crowd-sourced indexing scheme. If a visitor searches for a CKAN portal and it is not found within the index, the system will immediately query that portal and attempt to generate a new index entry on-the-fly. Aggregation of a new portal’s statistics into Datashades.info also happens automatically. Maximise the tool and gain interesting information with the following features:

Globally Accessible open data

With Datashades.info, you can easily access an index of metadata and statistics on CKAN data portals across the globe. To do this, simply type in the portal’s URL on the homepage then click “Search“.

Integrated Values of All Metrics

After entering a portal’s URL, Datashades.info will load its information. After a few seconds, you will be able to see a range of data on portal users, datasets, resources, organisations, tags and plugins. Portal managers can access these via the individual portal page found on the site.

Easily-tracked Historical Data

Want to revisit data you previously explored? The tool also keeps old data in a historical index which users can explore any time on any portal page or by clicking “View All Data Portals” on the homepage.

Crowdsourcing

Datashades.info uses crowdsourcing to build its index. This means users can easily add any CKAN data portal not found on the site. To do this, simply search for a portal you know and it’ll be automatically added to the site and global statistics. As the project remains at a beta level of maturity, it is still wanting of improvements in many areas. But with the continuous feedback coming from the CKAN community, expect that more data and features will be added in future releases. For now, have a look around and stay tuned!  

Statement from the Open Knowledge Foundation Board on the future of the CKAN Association

- June 6, 2019 in ckan, Open Data, Open Knowledge, Open Knowledge Foundation

The Open Knowledge Foundation (OKF) Board met on Monday evening to discuss the future of the CKAN Association.

The Board supported the CKAN Stewardship proposal jointly put forward by Link Digital and Datopian. As two of the longest serving members of the CKAN Community, it was felt their proposal would now move CKAN forward, strengthening both the platform and community.

In appointing joint stewardship to Link Digital and Datopian, the Board felt there was a clear practical path with strong leadership and committed funding to see CKAN grow and prosper in the years to come.

OKF will remain the ‘purpose trustee’ to ensure the Stewards remain true to the purpose and ethos of the CKAN project. The Board would like to thank everyone who contributed to the deliberations and we are confident CKAN has a very bright future ahead of it.

If you have any questions, please get in touch with Steven de Costa, managing director of Link Digital, or Paul Walsh, CEO of Datopian, by emailing stewards@ckan.org.

CKANconUS and Code for America Summit: some thoughts about the important questions

- June 20, 2018 in ckan, code for america, Events, OK US, USA

It’s been a few weeks after CKANConUS and the seventh Code for America Summit took place in Oakland. As always, it was a great place to meet old friends and new faces of technologists, policy experts, government innovators in the U.S. In this blogpost I share some of the experience of attending these two conferences and a few thoughts I’ve been ruminating about the discussions that happened, and more importantly, those that didn’t happen. CKAN is an open source open data portal platform that Open Knowledge International developed several years ago. It has been used and reused by many governments and civil society organizations around the world. For CKANconUS, the OK US group, led by Joel Natividad organized a one day event with different users and implementers of CKAN around the United States. We had the California based LA Counts, gathering data from the 88 cities in the County of Los Angeles; the California Data Collaborative working to improve water management decisions. We also had some interesting presentations from the GreenInfo Network and the California Natural Resources Agency. And we had the chance to hear about the awesome process of the Western Pennsylvania Regional Data Center to choose CKAN as its platform and how they maintain the project (presentation included LEGOs in every slide). On the more technical side, David Read, Ian Ward and our own Adrià Mercader talked about the new versions of CKAN, the Express Loader and the Technical Roadmap for CKAN, 11 years after its development started. You can view the slides by Adrià Mercader on the CKAN Technical Roadmap overview here. We closed with some great lightning talks about datamirror.org to ensure access to federal research data and Human Centered Design and what Amanda Damewood learned about working in government in improving these processes. The next two days in the Code for America Summit were full of interesting talks about building tools, innovating in our processes and making government work for people in a better way. There were some interesting keynote speakers as well as breakout sessions where we discussed the process to build certain projects and how we can rethink how we engage in our communities. I would like highlight two mainstage talks about collaboration (or the difficulty of such) between government and civil society. The first is a talk and panel about disasters in Puerto Rico, Houston and cities in Florida, where some key points were raised about the importance of having accurate, verifiable and usable information in these cases, as well as the importance of having a network of people who are willing to help their peers. The second is the presentation Code for Asheville presented, regarding their issues with homelessness and police data. This isn’t necessarily what you would call a success story but Sabrah n’haRaven made a great point about working with social issues: “Trust effective communities to understand their own problems”. This may sound like a given in the work we do when working with data and building things with it, but it’s something that we need to keep in mind. Using this line of thought, it seems crucial to keep these conversations going. We need to understand our communities, be aware that there are policies that go against the rights of people to live a fulfilling life and we need to change that. I hope for the next CfA Summit and CKANConUS we can try to find some answers to these questions collectively.

CKANconUS and Code for America Summit: some thoughts about the important questions

- June 20, 2018 in ckan, code for america, Events, OK US, USA

It’s been a few weeks after CKANConUS and the seventh Code for America Summit took place in Oakland. As always, it was a great place to meet old friends and new faces of technologists, policy experts, government innovators in the U.S. In this blogpost I share some of the experience of attending these two conferences and a few thoughts I’ve been ruminating about the discussions that happened, and more importantly, those that didn’t happen. CKAN is an open source open data portal platform that Open Knowledge International developed several years ago. It has been used and reused by many governments and civil society organizations around the world. For CKANconUS, the OK US group, led by Joel Natividad organized a one day event with different users and implementers of CKAN around the United States. We had the California based LA Counts, gathering data from the 88 cities in the County of Los Angeles; the California Data Collaborative working to improve water management decisions. We also had some interesting presentations from the GreenInfo Network and the California Natural Resources Agency. And we had the chance to hear about the awesome process of the Western Pennsylvania Regional Data Center to choose CKAN as its platform and how they maintain the project (presentation included LEGOs in every slide). On the more technical side, David Read, Ian Ward and our own Adrià Mercader talked about the new versions of CKAN, the Express Loader and the Technical Roadmap for CKAN, 11 years after its development started. You can view the slides by Adrià Mercader on the CKAN Technical Roadmap overview here. We closed with some great lightning talks about datamirror.org to ensure access to federal research data and Human Centered Design and what Amanda Damewood learned about working in government in improving these processes. The next two days in the Code for America Summit were full of interesting talks about building tools, innovating in our processes and making government work for people in a better way. There were some interesting keynote speakers as well as breakout sessions where we discussed the process to build certain projects and how we can rethink how we engage in our communities. I would like highlight two mainstage talks about collaboration (or the difficulty of such) between government and civil society. The first is a talk and panel about disasters in Puerto Rico, Houston and cities in Florida, where some key points were raised about the importance of having accurate, verifiable and usable information in these cases, as well as the importance of having a network of people who are willing to help their peers. The second is the presentation Code for Asheville presented, regarding their issues with homelessness and police data. This isn’t necessarily what you would call a success story but Sabrah n’haRaven made a great point about working with social issues: “Trust effective communities to understand their own problems”. This may sound like a given in the work we do when working with data and building things with it, but it’s something that we need to keep in mind. Using this line of thought, it seems crucial to keep these conversations going. We need to understand our communities, be aware that there are policies that go against the rights of people to live a fulfilling life and we need to change that. I hope for the next CfA Summit and CKANConUS we can try to find some answers to these questions collectively.

Validation for Open Data Portals: a Frictionless Data Case Study

- December 18, 2017 in case study, ckan, Data Quality, Frictionless Data, goodtables

The Frictionless Data project is about making it effortless to transport high quality data among different tools and platforms for further analysis. We are doing this by developing a set of software, specifications, and best practices for publishing data. The heart of Frictionless Data is the Data Package specification, a containerization format for any kind of data based on existing practices for publishing open-source software. Through its pilots, Frictionless Data is working directly with organisations to solve real problems managing data. The University of Pittsburgh’s Center for Urban and Social Research is one such organisation. One of the main goals of the Frictionless Data project is to help improve data quality by providing easy to integrate libraries and services for data validation. We have integrated data validation seamlessly with different backends like GitHub and Amazon S3 via the online service goodtables.io, but we also wanted to explore closer integrations with other platforms. An obvious choice for that are Open Data portals. They are still one of the main forms of dissemination of Open Data, especially for governments and other organizations. They provide a single entry point to data relating to a particular region or thematic area and provide users with tools to discover and access different datasets. On the backend, publishers also have tools available for the validation and publication of datasets. Data quality varies widely across different portals, reflecting the publication processes and requirements of the hosting organizations. In general, it is difficult for users to assess the quality of the data and there is a lack of descriptors for the actual data fields. At the publisher level, while strong emphasis has been put in metadata standards and interoperability, publishers don’t generally have the same help or guidance when dealing with data quality or description. We believe that data quality in Open Data portals can have a central place on both these fronts, user-centric and publisher-centric, and we started this pilot to showcase a possible implementation. To field test our implementation we chose the Western Pennsylvania Regional Data Center (WPRDC), managed by the University of Pittsburgh Center for Urban and Social Research. WPRDC is a great example of a well managed Open Data portal, where datasets are actively maintained and the portal itself is just one component of a wider Open Data strategy. It also provides a good variety of publishers, including public sector agencies, academic institutions, and nonprofit organizations. The portal software that we are using for this pilot is CKAN, the world leading open source software for Open Data portals (source). Open Knowledge International initially fostered the CKAN project and is now a member of the CKAN Association. We created ckanext-validation, a CKAN extension that provides a low level API and readily available features for data validation and reporting that can be added to any CKAN instance. This is powered by goodtables, a library developed by Open Knowledge International to support the validation of tabular datasets. The ckanext-validation extension allows users to perform data validation against any tabular resource, such as  CSV or Excel files. This generates a report that is stored against a particular resource, describing issues found with the data, both at the structural level, such as missing headers and blank rows,  and at the data schema level, such as wrong data types and  out of range values. Read the technical details about this pilot study, our learnings and areas we have identified for further work in the coming days here on the Frictionless Data website.

CKAN 2.7.2 설치 및 배포 안내 문서

- October 18, 2017 in ckan

CKAN은 데이터 및 메타데이터 카탈로그 공유를 위해 미국 및 영국 등의 여러 나라에서 널리 사용되고 있다.
CKAN을 설치하기 위해서는 CKAN 공식 문서(http://docs.ckan.org/en/ckan-2.7.0/maintaining/installing/index.html)를 따라서 CKAN Package 버젼을 쉽게 설치할 수 있지만,
리눅스 우분투 14.04 (LTS)  운영체제 이외의 다른 버전을 사용할 경우 CKAN 소스파일을 이용하여 설치해야 한다.
소스파일을 이용한 CKAN 설치 시 CKAN 이외에 추가로 설치해야 것들(필수 패키지, 라이브러리, 웹 서버 등)의 의존성 및 기타 환경 설정에 있어 공식 문서에서 설명되지 않은 에러를 만날 수 있다.
그러므로, 이 글을 통해 소스파일을 설치하면서 발생할 수 있는 에러와 그에 대한 해결 방법들을 공유하고자 한다.
설치 환경은 다음과 같으며 공식 문서에서 제공하는 문서를 함께 참고하길 바란다.
운영체제 : 우분투 (16.04 LTS version.), 2017.10월. (현재)
CKAN : CKAN 2.7.0 소스파일
가. CKAN 설치

1. Install the required packages
> 우분투 16.04에서 CKAN을 운영하기위해 필요한 필수 패키지는 다음과 같이 설치할 수 있다.
$ sudo apt-get install python-dev postgresql libpq-dev python-pip python-virtualenv git-core solr-jetty openjdk-8-jdk redis-server

2. Install CKAN into a Python virtual environment
a. Create a Python virtual environment (virtualenv) to install CKAN into, and activate it:
$ sudo mkdir -p /usr/lib/ckan/default
$ sudo chown `whoami` /usr/lib/ckan/default
$ virtualenv –no-site-packages /usr/lib/ckan/default
$ . /usr/lib/ckan/default/bin/activate
> 주의사항) 이후의 내용은 모두 가상환경에서 실행한다.
> 각 설치 단계에서 오류가 발생하면 다음 단계에서 또한 오류가 발생할 수 있으므로, 매 단계마다 설치가 성공적으로 끝나야 한다. (중간에 오류가 나지 않더라도 CKAN 최종 배포 시에 운영이 되지 않을 수 있다.)
b. Install the CKAN source code into your virtualenv. To install the latest stable release of CKAN (CKAN 2.7.0), run:
> 현재(2017.09.) git에 저장된 최신 버젼이 2.7.0 임. 버젼명을 바꿔서 설치도 가능.
[Error Tip] > git과 관련하여 ‘head’ 불일치 오류가 발생할 경우 다음과 같이 특정 버전이 없이 설치 할 수도 있다.
c. Install the recommended version of ‘setuptools’:
$ pip install -r /usr/lib/ckan/default/src/ckan/requirement-setuptools.txt
> 지금 현재 사용된 setuptools의 버젼은 36.6 임
d. Install the Python modules that CKAN requires into your virtualenv:
$ pip install -r /usr/lib/ckan/default/src/ckan/requirements.txt
[참고] Deactivate and reactivate your virtualenv, to make sure you’re using the virtualenv’s copies of commands like paster rather than any system-wide installed copies:
deactivate
$ . /usr/lib/ckan/default/bin/deactivate

3. Setup a PostgreSQL database
a. Check that PostgreSQL was installed correctly by listing the existing databases
$ sudo -u postgres psql -l
b. Next you’ll need to create a database user if one doesn’t already exist. Create a new PostgreSQL database user called ckan_default, and enter a password for the user when prompted. You’ll need this password later:
 $ sudo -u postgres createuser -S -D -R -P ckan_default
c. Create a new PostgreSQL database, called ckan_default, owned by the database user you just created:
$ sudo -u postgres createdb -O ckan_default ckan_default -E utf-8

4. Create a CKAN config file
a. Create a directory to contain the site’s config files:
$ sudo mkdir -p /etc/ckan/default
$ sudo chown -R `whoami` /etc/ckan/
$ sudo chown -R `whoami` ~/ckan/etc (/home 디렉토리에 CKAN 경로로 설정했을 경우에 권한 설정을 해주면 된다.)
b. Create the CKAN config file:
$ . /usr/lib/ckan/default/bin/activate
> 이후 설정은 모두 가상 환경으로 다시 들어가서 다음 설치 작업을 시작한다.
$ paster make-config ckan /etc/ckan/default/development.ini
Edit the development.ini file in a text editor, changing the following options:
1) sqlalchemy.url
$ sqlalchemy.url = postgresql://ckan_default:pass@localhost/ckan_default
> 위의 3-b에서 계정 설정시 정해놓은 암호를 pass로 대체한다.
2) site_id
Each CKAN site should have a unique site_id, for example:
$ ckan.site_id = default
3) site_url
Provide the site’s URL (used when putting links to the site into the FileStore, notification emails etc). For example:
$ ckan.site_url = http://demo.ckan.org
> 도메인을 가지고 있을 경우 기입하고 그렇지 않으면 http://localhost 또는 고정 IP가 있을 경우 IP를 입력 함. url은 사이트에서 회원 가입 등 링크를 따라 이동할때 base 경로가 되므로 잘 설정해야 함.

5. Setup Solr
Edit the Jetty configuration file (/etc/default/jetty8) and change the following variables:
a. Setting Solr
$sudo vi /etc/default/jetty8 (아래의 3문장 주석 해제)
NO_START=0            # (line 4)
JETTY_HOST=127.0.0.1  # (line 16)
JETTY_PORT=8983       # (line 19)
b. Start or restart the Jetty server. For Ubuntu 16.04:
$ sudo service jetty8 restart
c. Check welcome page of Solr, http://localhost:8983/solr/
d. Replace the default schema.xml file with a symlink to the CKAN schema file included in the sources.
$ sudo mv /etc/solr/conf/schema.xml /etc/solr/conf/schema.xml.bak
$ sudo ln -s /usr/lib/ckan/default/src/ckan/ckan/config/solr/schema.xml
$ sudo service jetty8 restart

6. Create database tables
1.create the database tables:
$ . /usr/lib/ckan/default/bin/activate
$ cd /usr/lib/ckan/default/src/ckan
$ paster db init -c /etc/ckan/default/development.ini

7. Link to who.ini
$ ln -s /usr/lib/ckan/default/src/ckan/who.ini /etc/ckan/default/who.ini
$ cd /usr/lib/ckan/default/src/ckan
$ paster serve /etc/ckan/default/development.ini
웹 브라우져로 접속하면 아래와 같이 초기화면을 확인 할 수 있음.

나. CKAN 설치가 끝나면 배포 준비 (Deploying a source install)
1. Create a production.ini File
$ cp /etc/ckan/default/development.ini /etc/ckan/default/production.ini

2. Install Apache, modwsgi, modrpaf
$ sudo apt-get install apache2 libapache2-mod-wsgi libapache2-mod-rpaf

3. Install Nginx
$ sudo apt-get install nginx
[참고] nginx 설치 시 의존성 에러 발생 하면 다음과 같이 재설치
$ sudo apt-get purge nginx-full nginx-common
$ sudo  apt-get install nginx-full

4. Create the WSGI script file
Create your site’s WSGI script file /etc/ckan/default/apache.wsgi with the following contents:
$ sudo vi /etc/ckan/default/apache.wsgi
import os
activate_this = os.path.join(‘/usr/lib/ckan/default/bin/activate_this.py’)
execfile(activate_this, dict(__file__=activate_this))
from paste.deploy import loadapp
config_filepath = os.path.join(os.path.dirname(os.path.abspath(__file__)), ‘production.ini’)
from paste.script.util.logging_config import fileConfig
fileConfig(config_filepath)
application = loadapp(‘config:%s’ % config_filepath)

5. Create the Apache config file
$ sudo vi /etc/apache2/sites-available/ckan_default.conf
<VirtualHost 127.0.0.1:8080>
    ServerName 지정된 DNS
    ServerAlias 정된 DNS의 다른 이름
    WSGIScriptAlias / /etc/ckan/default/apache.wsgi
    # Pass authorization info on (needed for rest api).
    WSGIPassAuthorization On
    # Deploy as a daemon (avoids conflicts between CKAN instances).
    WSGIDaemonProcess ckan_default display-name=ckan_default processes=2 threads=15
    WSGIProcessGroup ckan_default
    ErrorLog /var/log/apache2/ckan_default.error.log
    CustomLog /var/log/apache2/ckan_default.custom.log combined
    <IfModule mod_rpaf.c>
        RPAFenable On
        RPAFsethostname On
        RPAFproxy_ips 127.0.0.1
    </IfModule>
    <Directory />
        Require all granted
    </Directory>
</VirtualHost>

6. Modify the Apache ports.conf file
$ sudo vi /etc/apache2/ports.conf
 Listen 8080 (Listen 80에서 8080으로 변경)

7. Create the Nginx config file
Create your site’s Nginx config file at /etc/nginx/sites-available/ckan, with the following contents:
$ sudo vi /etc/nginx/sites-available/ckan
proxy_cache_path /tmp/nginx_cache levels=1:2 keys_zone=cache:30m max_size=250m;
proxy_temp_path /tmp/nginx_proxy 1 2;
server {
    client_max_body_size 100M;
    location / {
        proxy_pass http://127.0.0.1:8080/;
        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header Host $host;
        proxy_cache cache;
        proxy_cache_bypass $cookie_auth_tkt;
        proxy_no_cache $cookie_auth_tkt;
        proxy_cache_valid 30m;
        proxy_cache_key $host$scheme$proxy_host$request_uri;
        # In emergency comment out line to force caching
        # proxy_ignore_headers X-Accel-Expires Expires Cache-Control;
    }
}

8. Enable your CKAN site
To prevent conflicts, disable your default nginx and apache sites. Finally, enable your CKAN site in Apache.
$ sudo a2ensite ckan_default
$ sudo a2dissite 000-default
$ sudo rm -vi /etc/nginx/sites-enabled/default
$ sudo ln -s /etc/nginx/sites-available/ckan /etc/nginx/sites-enabled/ckan_default
$ sudo service apache2 reload
$ sudo service nginx reload
배포 완료

[추가 사항]
Dataset 추가를 위한 Admin 권한 부여 명령어 to [User1]
$ paster –plugin=ckan sysadmin add [User1] -c /etc/ckan/default/production.ini

New open energy data portal set to spark innovation in energy efficiency solutions

- June 22, 2017 in ckan, Viderum

Viderum spun off as a company from Open Knowledge International in 2016 with the aim to provide services and products to further expand the reach of open data around the world. Last week they made a great step in this direction by powering the launch of the Energy Data Service portal, which will make Denmark’s energy data available to everyone. This press release has been reposted from Viderum‘s website at http://www.viderum.com/blog/2017/06/17/new-open-energy-data-portal-set-to-spark-innovation.

Image credit: Jürgen Sandesneben, Flickr CC BY

A revolutionary new online portal, which gives open access to Denmark’s energy data, is set to spark innovation in smart, data-led solutions for energy efficiency. The Energy Data Service, launched on 17 June 2017 by the CEO of Denmark’s state-owned gas and electricity provider Energinet, and the Minister for Energy, Utilities and Climate, will share near real-time aggregated energy consumption data for all Danish municipalities, as well data on CO2emissions, energy production and the electricity market. Developers, entrepreneurs and companies will be able to access and use the data to create apps and other smart data services that empower consumers to use energy more efficiently and flexibly, saving them money and cutting their carbon footprint. Viderum is the technology partner behind the Energy Data Service. It developed the portal using CKAN, the leading data management platform for open data, originally developed by non-profit organisation Open Knowledge International. Sebastian Moleski, CEO of Viderum said: “Viderum is excited to be working with Energinet at the forefront of the open data revolution to make Denmark’s energy data available to everyone via the Energy Data Service portal. The portal makes a huge amount of complex data easily accessible, and we look forward to developing its capabilities further in the future, eventually providing real-time energy and CO2 emissions data.” Energinet hopes that the Energy Data Service will be a catalyst for the digitalisation of the energy sector and for green innovation and economic growth, both in Denmark and beyond. “As we transition to a low carbon future, we need to empower consumers to be smarter with how they use energy. The Energy Data Service will enable the development of innovative data based solutions to make this possible. For example, an electric car that knows when there is spare capacity on the electricity grid, making it a good time to charge itself.Or an app that helps local authorities understand energy consumption patterns in social housing, so they can make improvements that will save money and cut carbon”, said Peder Ø. Andreasen, CEO of Energinet. The current version of the Energy Data Service includes the following features:
  • API (Application Programme Interface) access to all raw data, which makes it easy to use in data applications and services
  • Downloadable data sets in regular formats (CSV and Excel)
  • Helpful user guides
  • Contextual information and descriptions of data sets
  • Online discussion forum for questions and knowledge sharing

Three ways ROUTETOPA promotes Transparency

- March 14, 2017 in ckan, Open Data, Open Knowledge, Route to PA

Data sharing has come a long way over the years. With open source tools, improvements and new features are always quickly on the rise. Serah Rono looks at how ROUTETOPA, a Horizon2020 project advocate for transparency.

From as far back as the age of enlightenment, the human race has worked hard to keep authorities accountable. Long term advocates of open data agree that governments are custodians, rather than owners, of data in their keep and should, therefore, avail the information they are charged with safekeeping for public scrutiny and use. Privacy and national security concerns are some of the most common barriers to absolute openness in governments and institutions in general around the world.

As more governments and organisations embrace the idea of open data, some end up, inadvertently, holding back on releasing data they believe is not ready for the public eye, a phenomenon known as ‘data-hugging’. In other instances, governments and organisations end up misleading the general public about the actual quantity and quality of information they have made public. This is usually a play at politics – a phenomenon referred to as ‘open-washing’ and is very frustrating to the open data community. It does not always stop here – some organisations are known to notoriously exaggerate the impact of their open data work  – a phenomenon Andy Nickinson refers to as ‘open-wishing’.

The  Horizon2020 project, Raising Open and User-Friendly Transparency Enabling Technologies for Public Administrations (ROUTETOPA), works to bridge the gap between open data users and open data publishers. You can read the project overview in this post and find more information on the project here.

In an age of open-washing and data-hugging, how does ROUTETOPA advocate for transparency

  1. ROUTETOPA leads by example!

The source code for ROUTETOPA tools is open source and lives in this repository. ROUTETOPA also used CKAN, a renowned data portal platform, as the basis for its Transparency Enabling Toolkit (TET). TET provides public administrators in ROUTETOPA’s pilot cities with a platform to publish and open up their data to the public. You can read more about it here. 

       2. Data publishers as pilot leads

ROUTETOPA pilots are led by public administrators. This ensures that public administrators are publishing new data regularly and that they are also at hand to answer community questions, respond to community concerns and spearhead community discussions around open data in the five pilot cities.

3.Use of online and offline communication channels

Not only does ROUTETOPA have an active social media presence on Facebook, Twitter and Youtube, it also has its own social media platform, the Social Platform for Open Data (SPOD) that provides a much needed avenue for open data discourse between data publishers and users.  The pilots in Prato, Groningen, Dublin, Issy and Den Haag also hold regular workshops, focus groups and tool test parties. Offline engagement is more relatable, and creates rapport between public administrations and citizens and is also a great avenue for making data requests.

The ROUTETOPA consortium also runs an active blog that features project updates and lessons learnt along the way. Workshops and focus groups are a key part of the success of this project, as user feedback informs the development process of ROUTETOPA tools.

ROUTETOPA partners also attend and spread the work in open data conferences and seminars, to keep the open data community across Europe in the know, and as an avenue to invite the community to test the tools, give feedback, and if it suites, adapt the tools for use in their organizations, institutions and public administrations.

Need clarification, or want to plug in and be a part of ROUTETOPA’s progress? Write to serah.rono@okfn.org. Stay open!

7 ways the ROUTE-TO-PA project has improved data sharing through CKAN

- February 27, 2017 in ckan, Route to PA

Data sharing has come a long way over the years. With open source tools, improvements and new features are always quickly on the horizon. Serah Rono looks at the improvements that have been made to open source data management system CKAN through the course of the ROUTE-TO-PA project.  In the present day, 5MB worth of data would probably be a decent photo, a three-minute song, or a spreadsheet. Nothing worth writing home about, let alone splashing across front pages of mainstream media. This was not the case in 1956 though –  in September of that year, IBM made the news by creating a 5MB hard drive. It was so big, a crane was used to lift it onto a plane. Two years later, in 1958, the World Data Centre was established to allow users open access to scientific data. Over the years, data storage and sharing options have evolved to be more portable, secure, and with the blossoming of the Internet, virtual, too. One such virtual data sharing platform, CKAN, has been up and running for ten years now. CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. It is no wonder then that ROUTE-TO-PA, a Horizon2020 project pushing for transparency in public administrations across the EU, chose CKAN as a foundation for its Transparency Enhancing Toolset (TET). As one of ROUTE-TO-PA’s tools, the Transparency Enhancing Toolset provides data publishers with a platform on which they can open up data in their custody to the general public. So, what improvements have been made to the CKAN base code to constitute the Transparency Enhancing Toolset? Below is a brief list:

1. Content management system support

CKAN Integration with a content management system enables publishers to publish content related to datasets and publish updates related to the portal in an easy way. TET WordPress plugin seamlessly integrates TET enabled CKAN and provides rich content publishing features to publishers and an elegantly organized entry point to data portal. 

2. PivotTable

CKAN platform has limited data analysis capabilities, essential for working with data. ROUTE-TO-PA added a PivotTable feature to allow users to view, summarize and visualize data. From the data explorer in this example, users can easily create pivot tables and even run SQL queries.  See source code here.

3. OpenID

ROUTE-TO-PA created an OpenID plugin for CKAN which enabled OpenID authentication on CKAN. See source code here.

4. Recommendation for related datasets

With this feature, the application recommends related datasets a user can look at based on the current selection and other contextual information. The feature guides users to find potentially useful and relevant datasets. See example in this search result for datasets on bins in Dublin, Ireland.

5. Combine Datasets Feature

This feature allows users to combine related datasets in their search results within TET into one ‘wholesome’ dataset. Along with the Refine Results feature, the Combined Datasets feature is found in the top right corner of the search results page, as in this example. Please note, that only datasets with the same structure can be combined at this point. Once combined, the resulting dataset can be downloaded for use.

6. Personalized search and recommendations

Personalized search feature allows logged-in users to get personalized search based on details provided in their profile. In addition logged-in users are provided with personalized recommendations based on their profile details.

7. Metadata quality check/validation

Extra validations to dataset entry form are added to prevent data entry errors and to ensure consistency. You can find, borrow from and contribute to CKAN and TET code repositories on Github, join CKAN’s global user group or email serah.rono@okfn.org with any/all of your questions. Viva el open source!