You are browsing the archive for Open Software.

csv,conf returns for version 5 in May

- October 15, 2019 in #CSVconf, Events, Frictionless Data, News, Open Data, Open Government Data, Open Research, Open Science, Open Software

Save the data for csv,conf,v5! The fifth version of csv,conf will be held at the University of California, Washington Center in Washington DC, USA, on May 13 and 14, 2020.    If you are passionate about data and its application to society, this is the conference for you. Submissions for session proposals for 25-minute talk slots are open until February 7, 2020, and we encourage talks about how you are using data in an interesting way (like to uncover a crossword puzzle scandal). We will be opening ticket sales soon, and you can stay updated by following our Twitter account @CSVconference.   csv,conf is a community conference that is about more than just comma-sepatated-values – it brings together a diverse group to discuss data topics including data sharing, data ethics, and data analysis from the worlds of science, journalism, government, and open source. Over two days, attendees will have the opportunity to hear about ongoing work, share skills, exchange ideas (and stickers!) and kickstart collaborations.   
csv,conf,v4

Attendees of csv,conf,v4

First launched in July 2014,  csv,conf has expanded to bring together over 700 participants from 30 countries with backgrounds from varied disciplines. If you’ve missed the earlier years’ conferences, you can watch previous talks on topics like data ethics, open source technology, data journalism, open internet, and open science on our YouTube channel. We hope you will join us in Washington D.C. in May to share your own data stories and join the csv,conf community!   Csv,conf,v5 is supported by the Sloan Foundation through OKFs Frictionless Data for Reproducible Research grant as well as by the Gordon and Betty Moore Foundation, and the Frictionless Data team is part of the conference committee. We are happy to answer all questions you may have or offer any clarifications if needed. Feel free to reach out to us on csv-conf-coord@googlegroups.com, on twitter @CSVconference or our dedicated community slack channel   We are committed to diversity and inclusion, and strive to be a supportive and welcoming environment to all attendees. To this end, we encourage you to read the Conference Code of Conduct.
Rojo the Comma Llama

While we won’t be flying Rojo the Comma Llama to DC for csv,conf,v5, we will have other mascot surprises in store.

Public money? Public code!

- September 20, 2017 in Open Software

If taxpayers pay for something, they should have access to the results of the work they paid for. This seems a very logical basic premise that no-one would disagree with, but there are many cases of where this is not common practice. For example, in various countries Freedom of Information laws do not fully apply to cases where governments outsource services. This would prevent you from finding out how your tax money has been spent. Or think about the cost of access to academic outputs resulting from public money: while much of the university research is paid for by the public, the academic outputs are locked away in academic journals, university libraries pay a lot of money to have access to these outputs, and the general public has no access at all unless they pay up. But there is another important area where taxpayers’ money is used to lock away results. In our increasingly digitised societies, more and more software is being built by governments, or commissioned to external parties. The results of that work is in most cases proprietary software, which continues to be owned by the supplier. As a result, governments suffer from vendor lock-in, which means they rely fully on the external supplier for anything related to the software. No-one else is able to provide any adaptations or additions to the software, test the software properly to make sure there are no vulnerabilities, and the government cannot easily move to a different supplier if they are unhappy with the software provided. An easy solution for these issues exists: mandate that all software developed using public money is public code: stipulate in all contracts with external suppliers that the software they develop is released  under a Free and Open Source Software license. This issue forms the heart of the Public Code, Public Money campaign the Free Software Foundation Europe launched recently. The ultimate aim of the campaign is to make sure Free and Open Source Software will be the default option for publicly financed software everywhere. Open Knowledge International wholeheartedly supports this movement and we add our voice to the creed: If it is public money, it should be public code! Together with all signatories, we call on our representatives to take the necessary steps to require that publicly financed software developed for the public sector be made publicly available under a Free and Open Source Software licence. This topic is dear to us at Open Knowledge International. As the original developers and one of the main stewards of the CKAN software, we try to do our bit to make sure there is trustworthy, high quality open source software available for governments to deploy. CKAN is currently used by many governments worldwide (include the US, UK, Canada, Brasil, Germany, and Australia – to name a few) to publish data. As many governments have similar needs in publishing data on their websites, it would be a waste of public money if each government commissions the development of their own platform, or even pay a commercial supplier for a proprietary product. Because if a good open source solution is available, governments do not have to pay for license fees for the software: they use it for free. They can still contract an external company to deploy the open source software for them, and make any adaptations that they might want. But as long as these adaptations are also released as open source, the government is not tied to this one supplier – since the software is freely accessible they can easily take it to a different supplier if they’re unhappy. In practice though, this is not the case for most software in use by governments, and they continue to rely on suppliers for whom a vendor lock-in model is attractive. But we know change is possible. We have seen some successes in the last few years in the area of academic publishing, as the open access movement has gathered steam: increasingly funders of academic research stipulate that if you receive grants from them, you are expected to publish the result of this work under an open access license, which means that anyone can read and download their work. We hope a similar transformation is possible for publicly funded software, and we urge you all to add your signature to the campaign now!

Git for Data Analysis – why version control is essential for collaboration and for gaining public trust.

- November 29, 2016 in Featured, Frictionless Data, Open Data, Open Research, Open Science, Open Software, Open Standards

Openness and collaboration go hand in hand. Scientists at PNNL are working with the Frictionless Data team at Open Knowledge International to ensure collaboration on data analysis is seamless and their data integrity is maintained. I’m a computational biologist at the Pacific Northwest National Laboratory (PNNL), where I work on environmental and biomedical research. In our scientific endeavors, the full data life cycle typically involves new algorithms, data analysis and data management. One of the unique aspects of PNNL as a U.S. Department of Energy National Laboratory is that part of our mission is to be a resource to the scientific community. In this highly collaborative atmosphere, we are continuously engaging research partners around the country and around the world.

collaborationImage credit: unsplash (public domain)

One of my recent research topics is how to make collaborative data analysis more efficient and more impactful. In most of my collaborations, I work with other scientists to analyze their data and look for evidence that supports or rejects a hypothesis. Because of my background in computer science, I saw many similarities between collaborative data analysis and collaborative software engineering. This led me to wonder, “We use version control for all our software products. Why don’t we use version control for data analysis?” This thought inspired my current project and has prompted other open data advocates like Open Knowledge International to propose source control for data. Openness is a foundational principle of collaboration. To work effectively as a team, people need to be able to easily see and replicate each other’s work. In software engineering, this is facilitated by version control systems like Git or SVN. Version control has been around for decades and almost all best practices for collaborative software engineering explicitly require version control for complete sharing of source code within the development team. At the moment we don’t have a similarly ubiquitous framework for full sharing in data analysis or scientific investigation. To help create this resource, we started Active Data Biology. Although the tool is still in beta-release, it lays the groundwork for open collaboration. customizationwithactivedata The original use case for Active Data Biology is to facilitate data analysis of gene expression measurements of biological samples. For example, we use the tool to investigate the changing interaction of a bacterial community over time; another great example is the analysis of global protein abundance in a collection of ovarian tumors. In both of these experiments, the fundamental data consist of two tables: 1) a matrix of gene expression values for each sample; 2) a table of metadata describing each sample. Although the original instrument files used to generate these two simple tables are often hundreds of gigabytes, the actual tables are relatively small.

To work effectively as a team, people need to be able to easily see and replicate each other’s work.

After generating data, the real goal of the experiment is to discover something profoundly new and useful – for example how bacteria growth changes over time or what proteins are correlated with surviving cancer. Such broad questions typically involve a diverse team of scientists and a lengthy and rigorous investigation. Active Data Biology uses version control as an underlying technology to ease collaboration between these large and diverse groups. stalemateActive Data Biology creates a repository for each data analysis project. Inside the repository live the data, analysis software, and derived insight. Just as in software engineering, the repository is shared by various team members and analyses are versioned and tracked over time. Although the framework we describe here was created for our specific biological data application, it is possible to generalize the idea and adapt it to many different domains. An example repository can be found here. This dataset originates from a proteomics study of ovarian cancer. In total, 174 tumors were analyzed to identify the abundance of several thousand proteins. The protein abundance data is located in this repository. In order to more easily analyze this with our R based statistical code, we also store the data in an Rdata file (data.Rdata). Associated with this data file is a metadata table which describes the tumor samples, e.g. age of the patient, tumor stage, chemotherapy status, etc. It can be found at metadata.tsv (For full disclosure, and to calm any worries, all of the samples have been de-identified and the data is approved for public release.) Data analysis is an exploration of data, an attempt to uncover some nugget which confirms a hypothesis. Data analysis can take many forms. For me it often involves statistical tests which calculate the likelihood of an observation. For example, we observe that a set of genes which have a correlated expression pattern and are enriched in a biological process. What is the chance that this observation is random? To answer this, we use a statistical test (e.g. a Fisher’s exact test). As the specific implementation might vary from person to person, having access to the exact code is essential. There is no “half-way” sharing here. It does no good to describe analyses over the phone or through email; your collaborators need your actual data and code. In Active Data Biology, analysis scripts are kept in the repository. This repository had a fairly simple scope for statistical analysis. The various code snippets handled data ingress, dealt with missing data (a very common occurrence in environmental or biomedical data), performed a standard test and returned the result. Over time, these scripts may evolve and change. This is exactly why we chose to use version control, to effortlessly track and share progress on the project. We should note that we are not the only ones using version control in this manner. Open Knowledge International has a large number of GitHub repositories hosting public datasets, such as atmospheric carbon dioxide time series measurements. Vanessa Bailey and Ben Bond-Lamberty, environmental scientists at PNNL, used GitHub for an open experiment to store data, R code, a manuscript and various other aspects of analysis. The FiveThirtyEight group, led by Nate Silver, uses GitHub to share the data and code behind their stories and statistical exposés. We believe that sharing analysis in this way is critical for both helping your team work together productively and also for gaining public trust. At PNNL, we typically work in a team that includes both computational and non-computational scientists, so we wanted to create an environment where data exploration does not necessarily require computational expertise. To achieve this, we created a web-based visual analytic which exposes the data and capabilities within a project’s GitHub repository. This gives non-computational researchers a more accessible interface to the data, while allowing them access to the full range of computational methods contributed by their teammates. We first presented the Active Data Biology tool at Nature’s Publishing Better Science through Better Data conference. It was here that we met Open Knowledge International. Our shared passion for open and collaborative data through tools like Git led to a natural collaboration. We’re excited to be working with them on improving access to scientific data and results. logoOn the horizon, we are working together to integrate Frictionless Data and Good Tables into our tool to help validate and smooth our data access. One of the key aspects of data analysis is that it is fluid; over the course of investigation your methods and/or data will change. For that reason, it is important that the data integrity is always maintained. Good Tables is designed to enforce data quality; consistently verifying the accuracy of our data is essential in a project where many people can update the data.

One of the key aspects of data analysis is that it is fluid…For that reason, it is important that the data integrity is always maintained.

One of our real-world problems is that clinical data for biomedical projects is updated periodically as researchers re-examine patient records. Thus the meta-data describing a patient’s survival status or current treatments will change. A second challenge discovered through experience is that there are a fair number of entry mistakes, typos or incorrect data formatting. Working with the Open Knowledge International team, we hope to reduce these errors at their origin by enforcing data standards on entry, and continuously throughout the project. I look forward to data analysis having the same culture as software engineering, where openness and sharing has become the norm. To get there will take a bit of education as well as working out some standard structures/platforms to achieve our desired goal.

Open source in everyday life: How we celebrated the Software Freedom Day in Bengaluru

- October 26, 2016 in free software, india, OK India, Open Software, south east asia

The free and open source software (FOSS) enthusiasts just celebrated the Software Freedom Day (SFD) on September 17 all across the world. This year, a small group of six of us gathered to celebrate SFD in the Indian city of Bengaluru. The group consisted of open source contributors from communities such as Mozilla, Wikimedia, Mediawiki, Open Street Map, and users of FOSS solutions. Each participant shared their own stories of how they got connected with FOSS and what component it plays in their day-to-day life. From how a father has been trying to introduce about open source to his young son while migrating from proprietary to open source back and forth as his job demands so, to an Open Street Map contributor who truly believes that large-scale contributions to open source can make the software as robust as proprietary ones and even better because of the freedom that lies in it. All of those who gathered agreed with the fact that FOSS has widened their freedom in choosing how they want to use, share and remix the software they use. When Software Freedom Day was started in 2004, only 12 teams from different places joined. It grew to a whopping 1000 by 2010 across the world. About the aim of the celebration, SFD’s official website says, “Our goal in this celebration is to educate the worldwide public about the benefits of using high-quality FOSS in education, in government, at home, and in business — in short, everywhere! The non-profit organization Software Freedom International coordinates SFD at a global level, providing support, giveaways and a point of collaboration, but volunteer teams around the world organize the local SFD events to impact their communities.” sfd_2016_bengaluru_by_nima_lama-cc-by-sa-4-0The participants in our group bounced both technical and philosophical questions to each other to gauge the actual usage of FOSS in real life, and we are moving towards adopting openness as a society. And all the participants also agreed that there is a significant disconnect in communicating widely about the work that many Indian FOSS and other free knowledge communities are doing. So they planned to meet more regularly in events organized by any of the FOSS communities and try to connect with more people using social media and chat groups so that these interactions shape into an annual event to bring all open communities under one roof.   What are FOSS, Free Software, Open Source,  and FLOSS?   Free and open source software (FOSS or F/OSS), and Free/Libre and Open-Source Software (FLOSS) are umbrella terms that are used to include both Free software and open source software. Adopted by well-known software freedom advocate Richard Stallman in 1983, the free software has many names — libre software, freedom-respecting software, and software libre are some of them. As defined by the Free Software Foundation, one of the early advocates of software freedom, free software allows users not just to use the software with complete freedom, but to study, modify, and distribute the software and any adapted versions, in both commercial and non-commercial form. The distribution of the software for commercial and non-commercial way, however, depends on the particular license the software is released under. The Creative Commons licenses have recommendations for a broad range of free licenses that one can choose for the software-related documentations and any creative work they create. Similarly, there are several different open licenses for software and many other works that are related to software development.  “Open Source” was coined as an alternative to free software in 1998 by educational advocacy organization Open Source Initiative. Open source software is created collaboratively, made available with its source code, and it provides the user rights to study, change, and distribute the software to anyone and for any purpose. Supported by several global organizations like Google, Canonical, Free Software Foundation, Joomla, Creative Commons and Linux Journal, Software Freedom Day draws its inspiration from the philosophy that was grown by people like Richard Stallman who argues that free software is all about the freedom and not necessarily free of cost but provides the liberty to users from [proprietary software developers’] unjust power. SFD encourages everyone to gather in their own cities (map of places where SFD was organized this year), educate people around them about free software, promote on social media (with the hashtag #SFD2016 this year), even hacking with free software, organizing hackathons, running free software installation camps, and even going creative with flying a drone running free software!

southasia-quote

From South Asia, there were 13 celebratory events in India, 8 in Nepal, 1 in Bangladesh and 4 in Sri Lanka. South Asian countries have seen the adoption of both free software and open source software, in both individual and organizational level and by the government. The Free Software Movement of India was founded in Bengaluru, India in 2010 to act as a national coalition of several regional chapters working for promoting and growing the free software movement in India. The Indian government has launched an open data portal at data.gov.in portal for, initiated a new policy to adopt open source software, and asked vendors to include open source software applications while making Requests for proposals. Similarly, several free and open source communities and organizations like Mozilla India, Wikimedia India, Centre for Internet and Society, Open Knowledge India in India, Mozilla Bangladesh, Wikimedia Bangladesh, Bangladesh Open Source Network, Open Knowledge Bangladesh in Bangladesh, Mozilla Nepal, Wikimedians of Nepal and Open Knowledge Nepal in Nepal, Wikimedia Community User Group Pakistan in Pakistan, Lanka Software Foundation in Sri Lanka, that are operating from the subcontinent also promote free and open source software.

We promote open source and open Web technologies in the country. We are open to associate/work with existing open source or other community-run, public benefit organizations.
“Internet By The People, Internet For The People” (from Mozilla India wiki)

Mohammad Jahangir Alam, a lecturer from Southern University Bangladesh argues in a research paper that the use of open source software can help the government save enormous amount of money that are spent in purchasing proprietary software, “A large sum of money of government can be saved if the government uses open source software in different IT sectors of government  offices and  others sectors,  Because the government is providing computers to all educational institute from school to university level and they are using proprietary software. For this reason, the government is to expend a significant amount of many for buying proprietary software to run the computers. Another one is government paying a significant amount of money to the different vendors for buying different types of software to implement e-Governance project. So, the Government can use open source software for implanting projects to minimize the cost of the projects.” Check more ideas for celebrating Software Freedom Day, and a few more here while planning for next year’s Software Freedom Day in your city.

Why Open Source Software Matters for Government and Civic Tech – and How to Support It

- July 13, 2016 in Featured, Open Data, Open Software, Open/Closed, Policy, research

Today we’re publishing a new research paper looking at whether free/open source software matters for government and civic tech. Matters in the sense that it should have a deep and strategic role in government IT and policy rather than just being a “nice to have” or something “we use when we can”. As the paper shows the answer is a strong yes: open source software does matter for government and civic tech — and, conversely, government matters for open source. The paper covers:
  • Why open software is especially important for government and civic tech
  • Why open software needs special support and treatment by government (and funders)
  • What specific actions can be taken to provide this support for open software by government (and funders)
We also discuss how software is different from other things that government traditionally buy or fund. This difference is why government cannot buy software like it buys office furniture or procures the building of bridges — and why buying open matters so much. The paper is authored by our President and Founder Dr Rufus Pollock.

Why Open Software

We begin with four facts about software and government which form a basis for the conclusions and recommendations that follow.
  1. The economics of software: software has high fixed costs and low (zero) marginal costs and it is also incremental in that new code builds on old. The cost structure creates a fundamental dilemma between finding ways to fund the fixed cost, e.g. by having proprietary software and raising prices; and promoting optimal access by setting the price at the marginal cost level of zero. In resolving this dilemma, proprietary software models favour the funding of fixed costs but at the price of inefficiently raised pricing and hampering future development, whilst open source models favour efficient pricing and access but face the challenge of funding the fixed costs to create high quality software in the first place. The incremental nature of software sharpens this dilemma and contributes to technological and vendor lock-in.

  2. Switching costs are significant: it is (increasingly) costly to switch off a given piece of software once you start using it. This is because you make “asset (software) specific investments”: in learning how to use the software, integrating the software with your systems, extending and customizing the software, etc. These all mean there are often substantial costs associated with switching to an alternative later.
  3. The future matters and is difficult to know: software is used for a long time — whether in its original or upgraded form. Knowing the future is therefore especially important in purchasing software. Predictions about the future in relation to software are especially hard because of its complex nature and adaptability; behavioural biases mean the level of uncertainty and likely future change are underestimated. Together these mean lock-in is under-estimated.
  4. Governments are bad at negotiating, especially in this environment, and hence the lock-in problem is especially acute for Government. Government are generally poor decision-makers and bargainers due to the incentives faced by government as a whole and by individuals within government. They are especially weak when having to make trade-offs between the near-term and the more distant future. They are even weaker when the future is complex, uncertain and hard to specify contractually up front. Software procurement has all of these characteristics, making it particularly prone to error compared to other government procurement areas.

The Logic of Support

Note: numbers in brackets e.g. (1) refer to one of the four observations of the previous section.
A. Lock-in to Proprietary Software is a Problem Incremental Nature of Software (1) + Switching Costs (2)
imply …
Lock-in happens for a software technology, and, if it is proprietary, to a vendor Zero Marginal Cost of Software (1) + Uncertainty about the Future in user needs and technologies (3) + Governments are Poor Bargainers (4)
imply …
Lock-in to proprietary software is a problem
Lock-in has high costs and is under-estimated – especially so for government B. Open Source is a Solution Lock-in is a problem
imply …
Strategies that reduce lock-in are valuable Economics of Software (1)
imply …
Open-source is a strategy for government (and others) to reduce future lock-in
Why? Because it requires the software provider to make an up-front commitment to making the essential technology available both to users and other technologists at zero cost, both now and in the future Together these two points
imply …
Open source is a solution
And a specific commitment to open source in government / civic tech is important and valuable C. Open Source Needs Support
And Government / Civic Tech is an area where it can be provided effectively Software has high fixed costs and a challenge for open source is to secure sufficient support investment to cover these fixed costs (1 – Economics)
+
Governments are large spenders on IT and are bureaucratic: they can make rules to pre-commit up front (e.g. in procurement) and can feasibly coordinate whether at local, national or, even, international levels on buying and investment decisions related to software. imply … Government is especially well situated to support open source
AND
Government has the tools to provide systematic support
AND
Government should provide systematic support

How to Promote Open Software

We have established in the previous section that there is a strong basis for promoting open software. This section provides specific strategic and tactical suggestions for how to do that. There are five proposals that we summarize here. Each of these is covered in more detail in the main section below. We especially emphasize the potential of the last three options as it does not require up-front participation by government and can be boot-strapped with philanthropic funding. 1. Recognize and reward open source in IT procurement. Give open source explicit recognition and beneficial treatment in procurement. Specifically, introduce into government tenders: EITHER an explicit requirement for an open source solution OR a significant points value for open source in the scoring for solutions (more than 30% of the points on offer). 2. Make government IT procurement more agile and lightweight. Current methodologies follow a “spec and deliver” model in which government attempts to define a full spec up front and then seeks solutions that deliver against this. The spec and deliver model greatly diminishes the value of open source – which allows for rapid iteration in the open, and more rapid switching of provider – and implicitly builds lock-in to the selected provider whose solution is a black-box to the buyer. In addition, whilst theoretically shifting risk to the supplier of the software, given the difficulty of specifying software up front it really just inflates upfront costs (since the supplier has to price in risk) and sets the scene for complex and cumbersome later negotiations about under-specified elements. 3. Develop a marketing and business development support organization for open source in key markets (e.g. US and Europe). The organization would be small, at least initially, and focused on three closely related activity areas (in rough order of importance):
  1. General marketing of open source to government at both local and national level: getting in front of CIOs, explaining open source, demystifying and derisking it, making the case etc. This is not specific to any specific product or solution.

  2. Supporting open source businesses, especially those at an early-stage, in initial business development activities including: connecting startups to potential customers (“opening the rolodex”) and guidance in navigating the bureaucracy of government procurement including discovering and responding to RFPs.
  3. Promoting commercialization of open source by providing advice, training and support for open source startups and developers in commercializing and marketing their technology. Open source developers and startups are often strong on technology and weak on marketing and selling their solutions and this support would help address these deficiencies.
4. Open Offsets: establish target levels of open source financing combined with a “offsets” style scheme to discharge these obligations. An “Open Offsets” program would combine three components:
  1. Establish target commitments for funding open source for participants in the program who could include government, philanthropists and private sector. Targets would be a specific measurable figure like 20% of all IT spending or $5m.

  2. Participants discharge their funding commitment either through direct spending such as procurement or sponsorship or via purchase of open source “offsets”. “Offsets” enable organizations to discharge their open source funding obligation in an analogous manner to the way carbon offsets allow groups to deliver on their climate change commitments.
  3. Administrators of the open offset fund distribute the funds to relevant open source projects and communities in a transparent manner, likely using some combination of expert advice, community voting and value generated (this latter based on an estimate of the usage and value of created by given pieces of open software).
5. “Choose Open”: a grass-roots oriented campaign to promote open software in government and government run activities such as education. “Choose Open” would be modelled on recent initiatives in online political organizing such as “Move On” in the 2004 US Presidential election as well as online initiatives like Avaaz. It would combine central provision of message, materials and policy with localized community participation to drive change.