You are browsing the archive for Big Data.

O que faremos com os 40 trilhões de gigabytes de dados disponíveis em 2020?

- September 29, 2017 in Big Data, Dados Abertos, Inovação

Foto de um homem negro segurando um celular nas mãos. Ele é negro, usa óculos e tem a cabeça raspada.

Pessoa segura e olha para um celular. Foto: Pixabay / Creative Commons CC0.

Por Thiago Ávila* Com o crescimento da web e o uso massivo de tecnologias da informação, a quantidade de dados gerados e disponibilizados tem crescido exponencialmente. Neste contexto, é estabelecido um ciclo virtuoso de oferta e demanda, pois o aumento da necessidade de dados e informações impulsiona o desenvolvimento das Tecnologias da Informação e da Comunicação (TICs) e consequentemente, a evolução da capacidade e do volume de ferramentas tecnológicas viabilizou este crescimento expressivo da produção de dados e informações. Cumpre destacar que na atual dinâmica mundial, essa demanda por informações passa a se diversificar, seja pela sua rapidez na sua atualização, na sua distribuição geográfica ou ainda, em áreas do conhecimento que ainda apresentem carências na produção de informações a seu respeito. Este tema passa a ganhar maior relevância quando se é observado os prognósticos referentes ao volume de dados que serão produzidos nas próximas décadas. O estudo “A Universe of Opportunities and Challenges”, desenvolvido pela consultoria EMC [1], aponta que de 2006 a 2010, o volume de dados digitais gerados cresceu de 166 Exabytes para 988 Exabytes. Conforme a figura 1, existe a perspectiva que o volume de dados alcance a casa dos 40.000 Exabytes, ou 40 Zettabytes (ou 40 trilhões de Gigabytes).

Figura 1 – Estimativa de crescimento do volume de dados digitais de 2010 a 2020 [2].

Este mesmo estudo apresenta outros dados relevantes. Até 2020, a perspectiva que o volume de investimentos no ecossistema digital cresça em 40% em todo o mundo e, no mesmo período, o custo do investimento por gigabyte entre 2012 e 2020 deve cair de $ 2,00 para $0,20. Ademais, a tendência é de forte descentralização da economia digital no mundo, onde os países emergentes devem responder por 62% do market share. E o que poderemos fazer com toda essa oferta de dados que não param de crescer? Um importante estudo da consultoria McKinsey, denominado “Big Data: The Next Frontier For Innovation, Competition And Productivity”[2] aponta diversos potenciais para o uso massivo de grandes volumes dados na economia global, atualmente conhecido como Big Data. Segundo o estudo, existem cinco grandes maneiras em que usando dados grandes podem criar valor. Primeiro, o Big Data pode ajudar a descobrir um valor significativo nas bases de dados mediante a geração de informação transparente e utilizável em maior frequência. Em segundo lugar, as organizações poderão cada vez mais, criar e armazenar dados transacionais em formato digital, e obter informações muito mais precisas e detalhadas sobre diversas áreas, por exemplo, equilibrando seus estoques com as perspectivas de venda dos próximos meses ou semanas e com isto melhorar o seu desempenho. Em terceiro lugar, Big Data permite o aprimoramento da relação com os clientes, viabilizando uma extração e segmentação cada vez maior do perfil dos clientes de uma empresa. Em quarto lugar, análises sofisticadas pode melhorar substancialmente a tomada de decisões. E ainda, Big Data pode ser utilizado para melhorar e criar uma nova geração de produtos e serviços. Por exemplo, os fabricantes estão usando dados obtidos de sensores incorporados em produtos para criar pós-venda ofertas de serviços inovadores, como a manutenção proativa (medidas preventivas que se realizam antes de ocorrer uma falha sequer são notados). A Mckinsey prevê ainda que o Big Data poderá apoiar novas ondas de crescimento da produtividade, estimando um potencial de ampliação das margens operacionais na casa dos 60%. Ademais, o estudo prevê que o Big Data se tornará um dos diferenciais para o crescimento das empresas e diferenciação junto à concorrência. Diante de tais fatos, as empresas estão considerando o uso de grandes bases de dados cada vez mais a sério. No campo governamental, especialmente em estudos sobre dados abertos governamentais, como o guia para abertura de dados do Chile[3], outros benefícios da oferta de dados são identificados como:
  • Melhorar a eficiência da gestão pública e a qualidade das políticas públicas;
  • Agregar valor às informações e decisões governamentais;
  • Fomentar a inovação mediante a utilização de dados abertos no desenvolvimento de aplicações e serviços inovadores;
  • Promover o crescimento econômico através de informações ofertadas de forma massiva, permanente e confiável, a ser utilizada ou transformada para a criação de novos negócios e melhoria dos serviços de governo.
Tudo bem. As perspectivas da economia digital e da oferta de dados são muito promissoras, mas existem problemas relevantes a serem considerados, como:
  • Poderá haver uma escassez de talentos necessários para que as organizações possam aproveitar o potencial do Big Data. Em 2018, somente nos Estados Unidos da América, está previsto um gap de 140 a 190 mil profissionais com habilidades para análise de grandes bases de dados diante da demanda existente, e na camada gerencial, a previsão é que o gap seja de cerca de 1,5 milhões de gestores e analistas com o know-how necessário para usar o Big Data como subsídio para a tomada de decisão eficaz; [2]
  • Algumas questões devem ser superadas para capturar o potencial do Big Data, como o estabelecimento de políticas para tratamento da privacidade, segurança da informação e propriedade intelectual, bem como a reorganização dos fluxos produtivos para incorporar este novo ativo [2];
  • Para a tomada de decisão eficaz, as empresas poderão organizar não apenas as suas informações, mas também consumir cada vez mais as informações de terceiros (como fornecedores, governo, etc.) o que vai resultar em um esforço ainda maior para a melhoria da oferta de dados considerando o caráter cada vez mais descentralizado destes recursos de dados [2];
  • No que tange a Dados Abertos Governamentais, em 2012, já existiam cerca de 115 catálogos de dados governamentais disponíveis, ofertando cerca de 710.000 conjuntos de dados [4]. Atualmente, em 2015, segundo o DataPortals.org, existem 417 catálogos de dados governamentais abertos disponíveis em todos os continentes, o que comprova a rápida ascensão e distribuição geográfica desta oferta de dados;
  • Segundo a IBM[5], 80% dos dados produzidos nas empresas são desestruturados, ou seja, requerem um esforço muito maior para ser aproveitado para subsidiar a tomada de decisão, e certamente, parte destes dados não serão úteis para tal finalidade;
  • Quanto ao potencial de uso dos dados digitais do mundo, o estudo da EMC aponta um dado preocupante: em 2012, apenas 23% da informação digital do mundo é útil para gerar novas informações e conhecimento e apoiar a tomada de decisão no âmbito do Big Data, e deste total, apenas 3% destas informações são úteis para uso imediato (os demais 20% ainda precisam ser tratados para estar aptas ao uso) [1];
  • No cenário tecnológico atual, o volume de dados aptos a serem explorados para tomada de decisão (valor analítico) deve alcançar apenas 33% do volume total de 40 Zettabytes [1].
Em resumo, nas perspectivas atuais, 67% da oferta de dados em 2020 poderão ser inúteis para reuso e apoio à construção do conhecimento e subsidiar a tomada de decisão. Essa oferta de dados estará cada vez mais distribuída ao redor do globo. Ou seja, poderemos fazer muita coisa com estes 40 trilhões de terabytes ou simplesmente NADA. Dependerá muito dos nossos esforços para melhorar a qualidade desta oferta de dados, tratando os pré-requisitos para a obtenção de valor a partir do seu uso, como descrito brevemente neste artigo. Nos próximos artigos, exploraremos questões relevantes sobre os dados na economia digital, apresentando tendências e ações que estão sendo desenvolvidas no âmbito global para melhorar a oferta de dados na web e consequentemente explorar todo o seu potencial para a melhoria da ação governamental, empresarial, acadêmica, dentre outros. Até a próxima.
Thiago Ávila é conselheiro consultivo da Open Knowledge Brasil. * Estes artigos contam são oriundos de pesquisas científicas desenvolvidas no Núcleo de Excelência em Tecnologias Sociais (NEES), do Instituto de Computação da Universidade Federal de Alagoas (UFAL) e contam com a contribuição direta dos pesquisadores Dr. Ig Ibert Bittencourt (UFAL), Dr. Seiji Isotani (USP), e Armando Barbosa, Danila Oliveira, Judson Bandeira, Thiago Ávila e Williams Alcântara (UFAL). [1] Gantz, John and Reinsel. (2012). David. The Digital Universe In 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. EMC Corporation. Acesso em: jul. 2015. Disponível em: http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf [2] Manyika, James; Chui, Michael; Brown, Brad; Bughin, Jacques; Dobbs, Richard; Roxburgh, Charles & Byers, Angela Hung. (2011). Big data: The Next Frontier For Innovation, Competition, And Productivity. McKinsey Global Institute. Disponível em: http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation. Acesso em: jul. 2015 [3] Norma Técnica para Publicación de Datos Abiertos en Chile (2013). Gobierno de Chile. Unidad de Modernización y Gobierno Digital. Disponível em: http://instituciones.gobiernoabierto.cl/NormaTecnicaPublicacionDatosChile_v2-1.pdf. Acesso em: maio. 2015. [4] Hendler, James and Holm, Jeanne and Musialek, Chris and Thomas, George. (2012). US Government Linked Open Data: Semantic.data.gov. IEEE Intelligent Systems, p. 25-31, vol. 27. doi: 10.1109/MIS.2012.27. [5] IBM. (n.d).Apply New Analytics Tools To Reveal New Opportunities. IBM. Acesso em: jul. 2015. Disponível em: http://www.ibm.com/smarterplanet/us/en/business_analytics/article/it_business_intelligence.html Texto publicado no site Thiago Ávila. Ele faz parte da série de artigos Dados abertos conectados.
Flattr this!

Open Data Day Uganda – Promoting girls in Science and Technology

- April 22, 2016 in Big Data, ckan, computer, computers, Data, Featured, india, india open data summit, Meetups, Open Data, Open Data Day, Open Science, open-education

This post was written by Alwenyi Catherine Cassidy from Fund Africa Inc. Fund Africa Inc. is powered by Open Knowledge International, in partnership with NetSquared and Communication Without Boarders. We’re excited to be part of the 2016 International Open Data Day celebration in Kampala, Uganda. This event topic focused on open science and methods to encourage girls to join Science Technology Engineering and Maths (STEM) in Africa. The event was attended by mostly non-profit representatives, developers, data journalists, and members of the private sector. Participants were briefed about open data, features, types of open data, and its importance.  This was followed by a presentation from a representative of the ‘One Million Code Girls Project’, a program that aims to teach up to one million girls in Ugandan Secondary Schools between the ages of 13 and 17 how to code. Other resources shared include learning skills in project management, use of software to be used interactively by teams, and the reasons for open data. The presentations were followed by a focused group discussion and online twitter chats using the hashtag #TechchatAfrica.  A few recommendations were made, and the meeting concluded with a networking session. The following are the presentations we had:

1. Trello – Ednah Karamaji

While we were waiting for more participants to attend, we had Ednah Karamaji from Communications without Boarders (CWB) make a presentation on Trello – an android app that can be a useful tool for project management, especially in organizing events like the Open Data Day. She explained several features of Trello that include: team building, where a project manager can subscribe all team members to Trello, assign roles using cards, and allow the project manager to specify venue and time of the event.  Trello allows the user to set alerts for project deadlines, and indicate completion of activities.

SAM_18292. Introduction to Open Data – Alwenyi Catherine Cassidy

The meeting was officially opened with a prayer by Mr. Robert Kibaya of NetSquared, following which the participants were introduced to Open Data by Ms. Catherine Alwenyi Cassidy of Fund Africa Inc. The presentation described how open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The key features of openness are: Availability and access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form. Reuse and redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. Universal participation: everyone must be able to use, reuse, and redistribute — there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed. There are many kinds of open data that have potential uses and applications:
  • Cultural: Data about cultural works and artifacts — for example, titles and authors — generally collected and held by galleries, libraries, archives and museums.
  • Science: Data that is produced as part of scientific research from astronomy to zoology.
  • Finance: Data such as government accounts (expenditure and revenue) and information on financial markets (stocks, shares, bonds etc).
  • Statistics: Data produced by statistical offices such as the census and key socioeconomic indicators.
  • Weather: The many types of information used to understand and predict the weather and climate.
  • Environment: Information related to the natural environment such presence and level of pollutants.
  • Transport: Data such as timetables, routes, on-time statistics.

3. One Million Code Girls – Ashiraf Sebandekke

Since our event was focusing on Open Science and how to engage girls in Science, Technology, Engineering, and Mathematics (STEM). We had Ashiraf who presented to us about his experience working with girls on coding on the One Million Code Girls, a project of Google developers group Makerere University Business (MUBs – GDG) School that aims at training up to one million girls, coding through different programming languages including Scratch, Java, Java Script, e.t.c.  Ashiraf explained the different experiences as the project lead comparing two schools, one mixed secondary school (both boys and girls) and the other single school (girls only) and how they embarrassed the program. He observed that the girls-only schools were more conducive to learning than those in the mixed schools, as some students in the latter feel inferior, thinking science subjects are for boys; but altogether the students managed to change their mindset through the carrier guidance lectures given to them by the project facilitator and they expect to balance other subjects with science and technology.

SAM_1823

4. Why Open Data – Mr. Joseph Elunya

This year we also had an opportunity to hear a presentation from Mr. Elunya a data Journalist from Media Initiative for open governance and Reality Check Uganda who explained to us the why data should be open and not restricted to patents and copyrights as follows; Transparency. In a well-functioning, democratic society citizens need to know what their government is doing. To do that, they must be able freely to access government data and information and to share that information with other citizens. Transparency isn’t just about access, it is also about sharing and reuse — often, to understand material it needs to be analyzed and visualized. This requires that the material be open so that it can be freely used and reused. Regarding the release of social and commercial value: in a digital age, data is a key resource for social and commercial activities. Everything from finding your local post office to building a search engine requires access to data, much of which is created or held by governments. By opening up data, governments can help drive the creation of innovative business and services that deliver social and commercial value. Participation and engagement – participatory governance or, for businesses and organizations, engaging with your users and audience. Much of the time citizens are only able to engage with their own governance sporadically — maybe just at an election every 4 or 5 years. By opening up data, citizens are enabled to be much more directly informed and involved in decision-making. This is more than transparency: it’s about making a full “read/write” society, not just about knowing what is happening in the process of governance but being able to contribute to it

SAM_1835Discussion Session

The presentations were followed by active discussions; some of the questions that were asked included: “Is Open Data really a practical way to move forward?” Asked Ednah, who explained an incident where a certain gentleman used to extract information and images from their non-profit website to use on his website to solicit for funds. Catherine explained some basic principles that apply when opening data including having an open data license to give clarity the host’s rights.  To Ednah’s question of ‘if open data was really a practical way to move forward?’, Catherine also added the advantages of open data, and shared how most people have learned some skills like web design, programming, graphical design, etc. through data contributed freely by others on the internet.  She also referenced a highly useful open source website: Wikipedia. “To what extent should data be open?” asked Robert. Some of the participants explained that not all data is to be opened, some data is sensitive and need to be protected.  Ashiraf gave an example of how Apple Inc. could not share information from a client’s phone that would be used to curb terrorism. Some of the participants from Youth in Technology – Uganda were not conversant with ICT laws in Uganda to protect their ideas, saying that they work sleepless nights to come up with innovations and for them to just provide them in an open source manner for people to just use without crediting them didn’t make sense.  Ashiraf explained, “All ideas need to be patented for you to be protected”.  He continued by outlining a few Data laws in Uganda which include;
 Computer Misuse Act 2011  Electronic Transaction Act 2011  Uganda Electronics Media Act  Data protection and Privacy Bill 2014  Electronics Transaction Act The discussion was continued and was also available twitter using the #TechchatAfrica  

Data in December: Sharing Data Journalism Love in Tunisia

- January 11, 2016 in Big Data, Data Blog, Data Expeditions, Data for CSOs, Data Journalism, data science, data visualization, Open Data, School of Data, Tunisia

NRGI hosted the event #DataMuseTunisia in collaboration with Data Aurora and School of Data senior fellow Ali Rebaie on the 11th of December 2015 in beautiful Tunis where a group of CSO’s from different NGOs met in the Burge Du Lac Hotel to learn how to craft their datasets and share their stories through creative visuals. Bahia Halawi, one of the leading women data journalism practitioners in the MENA region and the co-founder at Data Aurora, led this workshop for 3 days. This event featured a group of professionals from different CSO’s. NRGI has been working closely with School of Data for the sake of driving economic development & transparency through data in the extractive industry. Earlier this year NRGI did similar events in Washington, Istanbul, United Kingdom, GhanaTanzania, Uganda and many others. The experience was very unique and the participants were very excited to use the open source tools and follow the data pipeline to end up with interactive stories. The first day started with an introduction to the world of data driven journalism and storytelling. Later on, participants checked out some of the most interesting stories worldwide before working with different layers of the data pipeline. The technical part challenged the participants to search for data related to their work and then scraping it using google spreadsheets, web extensions and scrapers to automate the data extraction phase. After that, each of the participants used google refine to filter and clean the data sets and  then remove redundancies ending up with useable data formats. The datasets were varied and some of them were placed on interactive maps through CartoDB while some of the participants used datawrapper to interactively visualize them in charts. The workshop also exposed participants to Tabula, empowering them with the ability of transforming documents from pdfs to excel. Delegates also discussed some of the challenges each of them faces at different locations in Tunisia. It was very interesting to see 12321620_1673319796270332_5440100026922548095_nparticipants share their ideas on how to approach different datasets and how to feed this into an official open data portal that can carry all these datasets together. One of the participants, Aymen Latrach, discussed the problems his team faces when it comes to data transparency about extractives in Tataouine. Other CSO’s like Manel Ben Achour who is a Project Coordinator at I WATCH Organization came already from a technical backgrounds and they were very happy to make use of new tools and techniques while working with their data. Most of the delegates didn’t come from technical backgrounds however and this was the real challenge. Some of the tools, even when they do not require any coding, mandate the knowledge about some technical terms or ideas. Thus, each phase in the data pipeline started with a theoretical explanatory session to familiarize delegates with the technical concepts that are to be covered. After that, Bahia had to demonstrate the steps and go around the delegates facing any problems to assist them in keeping up with the rest of the group. It was a little bit messy at the beginning but soon the participants got used to it and started trying out the tools on their own. In reality, trial and error is very crucial to developing the data journalism skills. These skills can never be attained without practice. 11232984_1673319209603724_5889072769128707064_n Another important finding, according to Bahia who discussed the importance of the learnt skills to the delegate’s community and workplace, is that each of them had his/her own vision about its use. The fact that the CSO’s had a very good work experience allowed them to have unique visions about the deployment of what they have learnt at their workplaces. This, along with the strong belief in the change open data portals can drive in their country are the only triggers to learning more tools and skills and bringing out better visualizations and stories that impact people around. The data journalism community 3 years ago was still at a very embryonic stage with few practitioners and data initiatives taking place in Africa and Asia. Today, with enthusiastic practitioners like Bahia Halawi and Ali Rebaie, and a community like School of Data spreading the love of data and the spirit of change it can make, the data journalism field has very promising expectations. The need for more initiatives and meet ups to develop the skills of CSOs in the extractive industries as well as other fields remains a priority for reaching out for true transparency in every single domain.  Thank you, You can connect with Bahia on Twitter @HalawiBahia. Flattr this!

India Open Data Summit

- February 15, 2015 in Big Data, ckan, computer, computers, Data, Featured, india, india open data summit, Meetups, Open Data, Open Data Day, Open Science, open-education

You have the chance to book your spot online now! The Entry Pass is Free. Hurry! In association with: The National Council of Education, Bengal. Sponsored by: Open Knowledge Micro Grants Events Partner: MeraEvents BOOK YOUR ENTRY PASS NOW! http://www.meraevents.com/event/india-open-data-summit   Open Knowledge India is organizing the India Open Data Summit on February 28, 2015. This year’s event is free for anyone to attend. There will be talks and workshops relating to Open Data, Open Science, Open Research and Open Education. The thrust this year will be on creating a sustainable and viable citizen driven, crowd sourced environment for Open Data. See you there! The event is intended to be a melting pot of ideas. Venue: Indumati Sabhagriha, 188, Raja S C Mallick Road, Kolkata – 700032, Beside Jadavpur University Campus. Time: 10:30 am Anyone* can attend the event for free and there is no provision for tickets. However, seats are limited and therefore, it will be wise to come early to the venue and register your name. Spread the word. Bring your friends along. Use the hashtag #OpenIndia.
  • If required, the organisers will have the right to deny entry to anyone, whose presence they find to be detrimental to the smooth functioning of the event.

India Open Data Summit

- February 15, 2015 in Big Data, ckan, computer, computers, Data, Featured, india, india open data summit, Meetups, Open Data, Open Data Day, Open Science, open-education

You have the chance to book your spot online now! The Entry Pass is Free. Hurry! In association with: The National Council of Education, Bengal. Sponsored by: Open Knowledge Micro Grants Events Partner: MeraEvents BOOK YOUR ENTRY PASS NOW! http://www.meraevents.com/event/india-open-data-summit   Open Knowledge India is organizing the India Open Data Summit on February 28, 2015. This year’s event is free for anyone to attend. There will be talks and workshops relating to Open Data, Open Science, Open Research and Open Education. The thrust this year will be on creating a sustainable and viable citizen driven, crowd sourced environment for Open Data. See you there! The event is intended to be a melting pot of ideas. Venue: Indumati Sabhagriha, 188, Raja S C Mallick Road, Kolkata – 700032, Beside Jadavpur University Campus. Time: 10:30 am Anyone* can attend the event for free and there is no provision for tickets. However, seats are limited and therefore, it will be wise to come early to the venue and register your name. Spread the word. Bring your friends along. Use the hashtag #OpenIndia.
  • If required, the organisers will have the right to deny entry to anyone, whose presence they find to be detrimental to the smooth functioning of the event.

India Open Data Summit

- February 15, 2015 in Big Data, ckan, computer, computers, Data, Featured, india, india open data summit, Meetups, Open Data, Open Data Day, Open Science, open-education

You have the chance to book your spot online now! The Entry Pass is Free. Hurry! In association with: The National Council of Education, Bengal. Sponsored by: Open Knowledge Micro Grants Events Partner: MeraEvents BOOK YOUR ENTRY PASS NOW! http://www.meraevents.com/event/india-open-data-summit   Open Knowledge India is organizing the India Open Data Summit on February 28, 2015. This year’s event is free for anyone to attend. There will be talks and workshops relating to Open Data, Open Science, Open Research and Open Education. The thrust this year will be on creating a sustainable and viable citizen driven, crowd sourced environment for Open Data. See you there! The event is intended to be a melting pot of ideas. Venue: Indumati Sabhagriha, 188, Raja S C Mallick Road, Kolkata – 700032, Beside Jadavpur University Campus. Time: 10:30 am Anyone* can attend the event for free and there is no provision for tickets. However, seats are limited and therefore, it will be wise to come early to the venue and register your name. Spread the word. Bring your friends along. Use the hashtag #OpenIndia.
  • If required, the organisers will have the right to deny entry to anyone, whose presence they find to be detrimental to the smooth functioning of the event.

India Open Data Summit

- February 15, 2015 in Big Data, ckan, computer, computers, Data, Featured, india, india open data summit, Meetups, Open Data, Open Data Day, Open Science, open-education

You have the chance to book your spot online now! The Entry Pass is Free. Hurry! In association with: The National Council of Education, Bengal. Sponsored by: Open Knowledge Micro Grants Events Partner: MeraEvents BOOK YOUR ENTRY PASS NOW! http://www.meraevents.com/event/india-open-data-summit   Open Knowledge India is organizing the India Open Data Summit on February 28, 2015. This year’s event is free for anyone to attend. There will be talks and workshops relating to Open Data, Open Science, Open Research and Open Education. The thrust this year will be on creating a sustainable and viable citizen driven, crowd sourced environment for Open Data. See you there! The event is intended to be a melting pot of ideas. Venue: Indumati Sabhagriha, 188, Raja S C Mallick Road, Kolkata – 700032, Beside Jadavpur University Campus. Time: 10:30 am Anyone* can attend the event for free and there is no provision for tickets. However, seats are limited and therefore, it will be wise to come early to the venue and register your name. Spread the word. Bring your friends along. Use the hashtag #OpenIndia.
  • If required, the organisers will have the right to deny entry to anyone, whose presence they find to be detrimental to the smooth functioning of the event.

The Data Journalism Bootcamp at AUB Lebanon

- January 29, 2015 in #OpenData Party, American University of Beirut, Big Data, bootcamp, Data Journalism, Events, fellowship, gephi, Mapping, School of Data, Workshop

Data love is spreading like never before. Unlike previous workshops we did in the MENA region, on the 18th of January 2015, we gave an intensive data journalism workshop at the American University of Beirut for four consecutive days in collaboration with Dr. Jad Melki, Director of media studiesilovedata program at AUB. The Data team at Data Aurora were really happy sharing this experience with students from different academic backgrounds, including media studies, engineering or business. The workshop was mainly led by Ali Rebaie, a Senior School of Data fellow, and Bahia Halawi, a data scientist at Data Aurora, along with the data community team assistants; Zayna Ayyad and Noor Latif. The aim of the workshop was to give the students an introduction to the world of open data and data journalism, in particular, through tutorials on open source tools and methods used in this field. Moreover, we wanted to put students on track regarding the use of data.AUBworkshop On the first day, the students were introduced to data journalism, from a theoretical approach, in particular, the data pipeline which outlined the different phases in any data visualization project: find, get, verify, clean, analyze and present. After that, students were being technically involved in scraping and cleaning data using tools such as open refine and Tabula. Day two was all about mapping, from mapping best practices to mapping formats and shapes. Students were first exposed to different types of maps and design styles that served the purpose of each map. Moreover, best mappings techniques and visualizations were emphasized to explain their relative serving purpose. Eventually, participants became able to differentiate between the dot maps and the choropleth maps as well as many others. Then they used twitter data that contained geolocations to contrast varying tweeting zones by placing these tweets at their origins on cartodb. Similarly, they created other maps using QGIS and Tilemill. The mapping exercises were really fun and students were very happy to create their own maps without a single line of code. On the third day, Bahia gave a lecture on network analysis, some important mathematical notions needed for working with graphs as well as possible uses and case studies related to this field. Meanwhile, Ali was unveiling different open data portals to provide the students with more resources and data sets. After these topics were emphasized, a technical demonstration on the use of Gephi to analyze two topics wasworkshopaub performed. Students were analyzing climate change and later, the AUB media group on Facebook was also analyzed and we had its graph drawn. It was very cool to find out that one of the top influencers in that network was among the students taking the training. Students were also taught to do the same analysis for their own friends’ lists. Facebook data was being collected through Netviz and the visualizations were being drawn using Gephi. After completing the interactive types of visualizations, the fourth day was about static ones, mainly, infographics. Each student had the chance to extract the information needed for an interesting topic to transform it into a visual piece.  Bahia was working around with students, teaching them how to refine the data so that it becomes simple and short, thus usable for building the infographic design. Later, Yousif, a senior creative designer at Data Aurora, trained the students on the use of Photoshop and illustrator, two of the tools commonly used by infographic designers. At the end of the session, each student submitted a well done infographic of which some are posted below. After the workshop Zayna had small talks with the students to get their feedback and here she quoted some of their opinions: “It should be a full course, the performance and content was good but at some point, some data journalism tools need to be more mature andStatic Infographics developed by the students at the workshop. user-friendly to reduce the time needed to create a story,” said Jad Melki, Director of media studies program at AUB, “it was great overall.” “It’s really good but the technical parts need a lot of time. We learned about new apps. Mapping, definitely I will try to learn more about it,” said Carla Sertin, a media student. “It was great we got introduced to new stuff. Mapping, I loved it and found it very useful for me,” said Ellen Francis, civil engineering student. “The workshop was a motivation for me to work more on this,” she added, “it would work as a one semester long course.” Azza El Masri, a media student, is interested in doing MA in data journalism. “I like it I expected it to be a bit harder, I would prefer more advanced stuff in scraping,” she added.   flattr this!

The World Tweets Nelson Mandela’s Death

- December 10, 2013 in Big Data, Data Stories, Mapping, School of Data, Storytelling, visualisation, visualization

The World Tweets Nelson Mandela’s DeathClick here to see the interactive version of the map above 

Data visualization is awesome! However, it conveys its goal when it tells a story. This weekend, Mandela’s death dominated the Twitter world and hashtags mentioning Mandela were trending worldwide. I decided to design a map that would show how people around the world tweeted the death of Nelson Mandela. First, I started collecting tweets associated with #RIPNelsonMandela using ScraperWiki. I collected approximately 250,000 tweets during the death day of Mandela. You can check this great recipe at school of data blog on how to extract and refine tweets. scraperwiki After the step above, I refined the collected tweets and uploaded the data into CartoDB. It is one of my favorite open source mapping tools and I will make sure to write a CartoDB tutorial in future posts. I used the Bubble or proportional symbol map which is usually better for displaying raw data. Different areas had different tweeting rates and this reflected how different countries reacted. Countries like South Africa, UK, Spain, and Indonesia had higher tweeting rates. The diameter of the circles represents the number of retweets. With respect to colors, the darker they appeared, the higher the intensity of tweets is. That’s not the whole story! Basically, it is easy to notice that some areas have high tweeting rates such as Indonesia and Spain. After researching about this topic, it was quite interesting to know that Mandela had a unique connection with Spain, one forged during two major sporting events. In 2010, Nelson Mandela was present in the stadium when Spain’s international football team won their first ever World Cup Football trophy as well. Moreover, for Indonesians, Mandela has always been a source of joy and pride, especially as he was fond of batik and often wore it, even in his international appearances. Nonetheless, it was evident that interesting insights can be explored and such data visualizations can help us show the big picture. It also highlight events and facts that we are not aware of in the traditional context. flattr this!

Big Data Kann Mehr

- July 23, 2013 in Big Data, Data Philanthropy, Featured

Wer den Gewinner des Eurovision Song Contest schon vor der eigentlichen Veranstaltung kennen will, den Mitarbeiterbedarf seines Drogeriemarkts Wochen im Voraus planen muss oder seine Chancen auf einen Sieg bei der amerikanischen Präsidentschaftswahl verbessern möchte, der verlässt sich heutzutage nicht mehr auf die Einschätzung von Experten, seine BWL-Kenntnisse oder ein Team von Beratern. Vielmehr wird er versuchen, die Antwort in einem Berg von Daten zu suchen. Von einigen als Datengold bejubelt, von anderen als Ende der Privatsphäre verteufelt, wird Big Data, so die etwas schwammige Bezeichnung für die Analyse von große Datenmengen, spätestens seit Mitte des vergangenen Jahres auch in Deutschland heiß diskutiert. Unter anderem Die Zeit, Süddeutsche Zeitung und Der Spiegel haben die diversen Facetten des Themas ausführlich beleuchtet. Auf Veranstaltungen wie Big Data – Goldmine oder Dynamit? oder Big Data – Chance für Deutschland werden die Potentiale und Gefahren von dem „Haufen Daten“ von Politik, Wissenschaft und Privatwirtschaft erörtert. Die wesentlichen Akteure in der deutschen Diskussion und gleichzeitig Zielgruppe staatlicher Förderprogramme sind Privatunternehmen, die, so die weitläufige Überzeugung, mit der Auswertung von Big Data erhebliche wirtschaftliche Potentiale anzapfen können. Nur wenn es gelänge, den Datenschatz zu heben, so der Tenor, könne ein Unternehmen im Daten-Zeitalter international konkurrenzfähig bleiben. Zweifellos bergen die Datenberge, die deutsche Unternehmen über Einkaufsgewohnheiten, Telefonierverhalten oder Musikvorlieben ihrer Kunden sammeln, enorme Potentiale für zielgenaue Werbung und Anpreisung von Produkten. An Beispielen besteht jedenfalls kein Mangel. Ein zentraler Aspekt bleibt in der deutschen Diskussion bislang jedoch vollständig Außen vor: Die Daten können nicht nur zur Profitmaximierung eingesetzt, sondern auch in der Hilfe bei Naturkatastrophen, der Erforschung von Nebenwirkungen von Medikamenten oder für die Eindämmung von übertragbaren Krankheiten genutzt werden. Eine wachsende Zahl von Institutionen beschäftigt sich daher mit der Herausforderung, die wertvollen Daten hinter den Firewalls der Unternehmen für das Gemeinwohl nutzbar zu machen. UN-Generalsekretär Ban Ki-moon hat hierfür eigens eine Initiative ins Leben gerufen, die sich zum Ziel gesetzt hat, durch die Analyse von zuvor anonymisierten Daten, die tagtäglich durch die Nutzung von Handy, Web & Co entstehen, die Arbeit von UN-Organisationen in Entwicklungsländern zu verbessern. So zeigen erste Studien, dass durch die Analyse von Twitter-Daten Hungerkatastrophen weit früher entdeckt werden können als es heute der Fall ist.
Quelle: UN Global Pulse

Quelle: UN Global Pulse

Dabei ist die Nutzung von Daten im Besitz von Mobilfunkanbietern, Internetkonzernen oder Marktforschungsinstitute durch gemeinwohlorientierte Organisationen alles andere als einfach. Insbesondere der Datenschutz ist hierbei eine zentrale Herausforderung. Denn nur wenn die Privatsphäre durch Aggregierung und vollständige Anonymisierung (siehe hierzu z.B. das Space Time Boxes Konzept oder die Differential Privacy Methode) ohne Zweifel sichergestellt werden kann, ist ein externer Zugriff denkbar. Wo eine vollständige Anonymisierung technisch nicht machbar und damit kein Zugriff auf die Daten möglich ist, könnten Unternehmen, Anwendungen in ihre Systeme integrieren, die automatisch über auffällige Veränderungen in den Daten – Bewegungen von großen Gruppen während Naturkatastrophen lassen sich anhand von Mobilfunkdaten sehr genau bestimmen – informiert werden. Eine schnellere und gezielte Hilfe ist dadurch möglich. Es mag überraschen, dass viele Unternehmen der Idee, die von ihnen gesammelten Daten für einen guten Zweck zugänglich zu machen, weitgehend offen gegenüber stehen und bereits in kleinem Rahmen mit der Bereitstellung ihrer Daten experimentieren. Schließlich machen Daten für eine wachsende Zahl von Firmen den entscheidenden Wettbewerbsvorteil aus. Doch zum einen könnten genau durch diese Daten Katastrophen besser bewältigt und damit (potentielle) Kunden vom Verlust von Hab und Gut bewahrt werden. Zum anderen könnte durch die Einrichtung einer Art Data Commons der Ursprung der darin enthaltenen Daten unkenntlich gemacht werden. Eine Gefahr für die Wettbewerbsfähigkeit bestände somit nicht mehr. Institutionen wie der Internationale Verband der Mobilfunkanbieter oder das Weltwirtschaftsforum diskutieren bereits mit Wissenschaftlern, Datenschutzexperten und Regierungsvertretern über die notwendigen rechtlichen, technischen und organisatorischen Grundlagen.
Quelle: UN Global Pulse

Quelle: UN Global Pulse

Auch wenn es hierzulande erste Überlegungen zur Rolle von Big Data bei der Bewältigung gesellschaftlicher Herausforderungen gibt, so ist der Fokus noch sehr stark auf den wirtschaftlichen Nutzen beschränkt. Dies liegt wahrscheinlich nicht zuletzt an der Komplexität der Thematik, die nur durch einen intensiven Austausch von Technologie- und Datenschutzexperten sowie Vertretern aus Politik, Zivilgesellschaft, Wissenschaft und Privatwirtschaft durchdrungen werden kann. Erste Denkanstöße könnten aus dem EU Projekt Big Data Public Private Forum, an dem die Open Knowledge Foundation beteiligt ist, resultieren. Doch eine systematische Auseinandersetzung mit der Frage, wie Daten des Privatsektors zur Bearbeitung von gesellschaftlichen Problemen genutzt werden könnten, findet in Deutschland bislang noch nicht statt. Disclaimer: Der Autor war von November 2012 bis Januar 2013 für die UN-Initiative UN Global Pulse tätig.