You are browsing the archive for data expedition.

Open Data Day 2016 Malaysia Data Expedition – Measuring Provision of Public Services for Education

- March 28, 2016 in data expedition, Open Data Day, School of Data

International Open Data DayThis blog post was written by the members of the Sinar project in Malaysia  In Malaysia, Sinar Project with the support of Open Knowledge International organised a one-day data expedition based on the guide from School of Data to search for data related to government provision of health and education services. This brought together a group of people with diverse skills to formulate questions of public interest. The data sourced would be used for analysis and visualisation in order to provide answers.

Data Expedition

School of Data D&D Character Sheet GiraffeA data expedition is a quest to explore uncharted areas of data and report on those findings. The participants with different skillsets gathered throughout the day at the Sinar Project office. Together they explored data relating to schools and clinics to see what data and analysis methods are available to gain insights on the public service provision for education and health. We used the guides and outlines for the data expedition from School of Data website. The role playing guides worked as a great ice breaker. There was healthy competition on who could draw the best giraffes for those wanting to prove their mettle as a designer for the team.    

Deciding what to explore, education or health?

The storyteller in the team, who was a professional journalist started out with a few questions to explore.
  • Are there villages or towns which are far away from schools?
  • Are there villages or towns which are far away from clinics and hospitals?
  • What is the population density and provision of clinics and schools?
The scouts then went on a preliminary exploration for whether this data exists.

Looking for the Lost City of Open Data

The Scouts, with the aid of the rest of the team, looked for data that could answer the questions. They found a lot of usable data from the Malaysian government open data portal This data included lists of all public schools and clinics with addresses, as well as numbers of teachers for each district. It was decided by the team that given the time limitation, the focus would be to answer the questions on education data. Another priority was to find data relating to class sizes to see if schools are overcrowded or not. Below you can see the data that the team found.  School of Data D&D Character Sheet 2


Open Data

Data in Reports



Not all schools are created equal, there are different types, some are considered as high achieving schools or Sekolah Berprestasi Tinggi


Open Data



Other Data

CIDB Construction Projects contains relevant information such as construction of schools and clinics Script to import into Elastic Search


Sinar Project had some budgets as open data, at state and federal levels that could be used as additional reference point. These were created as part of the Open Spending project.

Selangor State Government

Federal Government

Higher education



The team opted to focus on the available datasets to answer questions about education provision, by first converting all school addresses into geocoding, and then looking at joining up data to find out the relationship between enrollments, school and teacher ratios.

Joining up data

To join up data; the different data sets such as teacher numbers and schools, VLOOKUP function in Excel was used to join by School code.

Converting Address to geolocation (latlong)

To convert street addresses to latitude, longitude coordinates we used the dataset with the cleansed address’ along with a geocoding tool csvgeocode ./node_modules/.bin/csvgeocode ./input.csv ./output.csv --url "{{Alamat}}&key=" --verbose

Convert the completed CSV to GeoJSON points

Use the  csv2geojson <span style="font-weight: 400;">csv2geojson --lat "Lat" --lon "Lng" Selangor_Joined_Up_Moe.csv</span>

To get population by PBT

Use the data from state economic planning unit agency site for socio-economic data specifically section Jadual 8

To get all the schools separated by individual PBT (District)

UseGeoJSON of Schools data and PBT Boundary loaded into QGIS; and use the Vector > Geo-processing > Intersect.   A post from Stack Exchange suggests  it might be better to use Vector > Spatial Query > Spatial Query option.

Open Datasets Generated

The cleansed and joined up datasets created during this expedition are made available on GitHub. While the focus was on education, due to the similarity in available data, the methods were also applied to clinics also. See it on our repository –


All Primary and Secondary Schools on a Map with Google Fusion Tables

All Primary and Secondary Schools on a Map with Google Fusion Tables

Teacher to Students per school ratios

Teacher to Students per school ratios  


  • Teachers vs enrollment did not provide data relating to class size or overcrowding
  • Demographic datasets to measure schools to eligible population
  • More school datasets required for teachers, specifically by subject and class ratios
  • Methods used for location of schools can also be applied to clinics & hospital data
It was discovered that additional data was needed to provide useful information on the quality of education. There was not enough demographic data found to check against the number of schools in a particular district. Teacher to student ratio was also not a good indicator of problems reported in the news. The teacher to enrollment ratios was generally very low with a mean of 13 and median of 14. What was needed, was ratio by subject teachers, class size or against the population of eligible children of each area, to provide better insights. Automatically calculating the distance from points was also considered and matched up with whether there are school bus operators in the area. This was discussed because the distance from schools may not be relevant for rural areas, where there were not enough children to warrant a school within the distance policy. A tool to check distance from a point to the nearest school could be built with the data made available. This could be useful for civil society to use data as evidence to prove that distance was too far or transport not provided for some communities. Demographic data was found for local councils; this could be used by researchers using local council boundary data on whether there were enough schools against the population of local councils. Interestingly in Malaysia, education is under Federal government and despite having state and local education departments, the administrative boundaries do not match up with local council boundaries or electoral boundaries. This is a planning coordination challenge for policy makers. Administrative local council boundary data was made available as open data thanks to the efforts of another civil society group Tindak Malaysia, which scanned and digitized the electoral and administrative boundaries manually.

Running future expeditions

This was a one day expedition so it was time limited. For running these brief expeditions we learned the following:
  • Focus and narrow down expedition to specific issue
  • Be better prepared, scout for available datasets beforehand and determine topic
  • Focus on central repository or wiki of available data
Thank you to all of the wonderful contributors to the data expedition:
  • Lim Hui Ying (Storyteller)
  • Haris Subandie (Engineer)
  • Jack Khor (Designer)
  • Chow Chee Leong (Analyst)
  • Donaldson Tan (Engineer)
  • Michael Leow (Engineer)
  • Sze Ming (Designer)
  • Swee Meng (Engineer)
  • Hazwany (Nany) Jamaluddin (Analyst)
  • Loo (Scout)

Tips for teaching/training on data skills

- August 29, 2014 in community, data expedition, education, HowTo, training

(photo of Ignasi, Olu and Ketty by Heather Leson, July 2014 (CC-by))

< p dir="ltr">

(photo of Ignasi, Olu and Ketty by Heather Leson, July 2014 (CC-by))

You probably have a skill or knowledge that others would love to acquire… but teaching can be intimidating. Fear not! In this post, we will share a few tips from the School of Data network, which is filled with individuals who hold continuous trainings on all things data worldwide.

It’s not a great idea to improvise when you are frozen by stage fright, nor to realize in the middle of a workshop that you can’t continue as planned because you are missing materials. That’s why formal planning of each workshop can help. Here’s an example you could use.

Michael from School of Data in Berlin has a special piece of advice for your planning: “Be yourself! Find the teaching method you feel comfortable with (I like to do things ad-hoc, Anders prefers slides, e.g.)”

Also, maybe it’s a good idea to partner up. Cédric from School of Data in France makes a great point: “There are two essential things in a workshop: knowledge of methodology and knowledge of the subject. More often than not, it’s better to separate them between two people. One will make sure that the workshop goes smoothly, and the other will help individuals get past roadblocks”.

Be mindful of how you speak
Beyond what you say, the way you speak can have an impact on the success of your workshop. Michael (again) and Heather from School of Data in Toronto recommend that you try to speak a bit slower than you’re used to, with simple sentences, and avoiding jargon or descriptive metaphors.

Make it a friendly environment
Helping people feel comfortable and welcome is necessary in every educational setting. Happy from School of Data in the Philippines explains it: “The point is to keep it as trivial as possible so that people don’t feel intimidated by the skill level of others”.

Codrina from School of Data in Romania has a lot of experience here: she recommends not keeping it too serious, and rather make small jokes; also, “give a little pat on the back for those who ask questions”… And don’t forget to take breaks! Yuandra from School of Data in Indonesia reminds us of something crucial: refreshments and water. People won’t learn if they’re distracted by hunger.

Also, icebreakers. We all love icebreakers, and Olu from School of Data in Nigeria has these in mind.

Try to connect with your audience
We use this phrase a lot, but what does it mean? Ketty from School of Data in Uganda puts it in very practical terms: try to read the learner’s facial expressions for e.g. confusion/tiredness/intent. This will help you find the best ways to continue.

Also, Ketty adds, “sometimes you have to be flexible and allow the learners to change your program… A bit of a give & take approach”.

On a slightly different topic, but still related to your connection with the audience, Olu thinks your audience will be inspired to work harder in your workshop if you tell them stories of what data/open data can be used for. You can find some at the World Bank Open Data Blog, and here on School of Data.

Some other didactic considerations
Heather recommends that you repeat key things 3 times (but not right after each other – spread them throughout the workshop). Also, Codrina recommends repeating questions when they are asked so everyone can hear before the answer is given.

Another recommendation: If you have a really successful workshop, try to replicate it through other media. For example, run it on a hangout, write it out on a tutorial. Multiple content won’t be redundant – it will mean more and more people will have a chance to learn from it.

Happy has a great tip: “When you want to get the group to mingle and pair up (data analysts paired with visualizers, for example) one way to do it is to divide the group, 1 line for data analysts, another for visualizers. Then we ask them to line up according to a range of categories – from technical categories or something as simple as personal information, like the number of house they lived in during their childhood, for example”.

Make an effort to keep track of time and exactly how long you spend on each part, Cédric recommends, as this will help you plan for future trainings.

Your audience may well be outside the room where you are doing the training. Cédric adds: “Sometimes good suggestions can come from social media platforms like Twitter, so if you have an audience there, you might want to share some updates during the event. People might answer with ideas, technical advice or more”.

The workshop was fun and people attended. But did they really learn?

Try to evaluate this learning through different methods. Was everyone able to complete the exercises? What did they respond that they learned in your ‘exit survey’? Did you get good responses to your last round of oral questions?

Olu kindly shared a couple of forms that can be used for this purpose both before and after the training. Feel free to use them!

A few resources shared by the School of Data community
Notes from the OKFest How to Teach Data Session (July 2014)
Aspiration Tech has great tips in their guides (via Heather)
PSFK on how people make/learn (via Heather)
Escuela de Datos on our Local LATAM training lessons learned

flattr this!

La prima Data Expedition italiana di OKFN Italia e ISTAT @DataLab Censimenti a Bologna

- November 12, 2013 in data expedition, Events, Open Data, Riuso, School of Data

Durante OKCon 2013  il 19 settembre all’Università di Ginevra ho avuto la possibllità di prendere parte al workshop Learn how to run your data expedition con i fantastici mentori di School of data. L’iniziativa che da qualche tempo lo staff di Open Knowledge Foundation porta in giro per il mondo si è rivelata estremamente di successo grazie […]

la DataExpedition di School of Data di OKFN a Smartcity Exhibition

- October 18, 2013 in Cultura, data expedition, istat, Open Data, School of Data, smart city exhibition

All’interno degli eventi Smartcity Exhibition, ISTAT ha organizzato una sezione DataLab dove avvicinare alla cultura del dato e a come farne uso. Nel lungo programma, oggi e’ di scena, la Data Expedition che sarà condotta da Milena Marin (collegata via skype dal Brasile), Francesca De Chiara e Marco Montanari. La DataExpedition è un format del […]