You are browsing the archive for Panton Principles.

Recruiting Scientists

- June 10, 2014 in Panton Fellowships, Panton Principles

Working out where we should install our sensors

Working out where we should install our sensors

Anyone whose been following the progress of my fellowship through my blog posts will know that I have been working towards getting sensors into schools for a while now. Well a couple of weeks ago I finally ran an introductory session with some primary school pupils (aged 8-11) at Kibworth CE Primary school in Leicestershire. I had been developing the introductory material for a few weeks prior to the lesson with some help from the teachers at Kibworth who have been really responsive and open to my ideas. We decided we wanted this activity to be very student led so that they actually planned much of the experiment themselves to encourage them to think about why we were doing this in more depth. We titled the introductory session “What’s in the air you breathe?”.
Snapshots of the introductory presentation "What's in the air you breathe"

Snapshots of the introductory presentation “What’s in the air you breathe”

I started the session by introducing the topic of air quality to the students, from the very basic first discussions of what makes up the air to talking about emission sources and health effects of air pollution. The introduction lasted less than 20 minutes and I encouraged lots of discussion with the students, asking them specific questions to work out what knowledge they had and to allow them to teach one another. The response to this was great and I was impressed by how much they knew about the atmosphere, one student explained the greenhouse effect to us and another mentioned the ozone hole. I hadn’t expected them to know so much about the topics we were discussing and so I was really pleased when I started talking to them.We then showed the students the equipment that they would have in school and explained what everything did. It was then over to the students to work out in groups where they wanted to install all of the sensors. To make this decision I asked them to think about where they thought the sources of air pollution around the school would be and where there are people who would be breathing it in. They quickly identified that the highest levels of pollution were likely to be in the car park, near the road and at the bottom of the playground which was relatively close to a train line. They also told me that in the morning and afternnon lots of people would be walking through the car park and at lunchtimes the students would all be in the playground. At this point one of the fundamental hurdles of being a field work scientist had to also be explained to the students- some of the sensors need mains power and so although the school gates may have been a good position in terms of producing interesting data, logistically it wasn’t possible to power the sensor that far from the school building.After lots of enthusiastic discussion and some expectation management they decided that they would like to put the sensor in three positions and so the pupils planned to move it around the school during the term. These were:
  1. In the playground near to the car park and the chicken coop- they wanted to see what levels of pollution the chickens were being exposed to as well as themselves during playtime.
  2. At the bottom of the playground near to the train tracks.
  3. In the main playground where most of the students played at lunchtimes.
Lots of enthusiastic ideas...

Lots of enthusiastic ideas…

The sensors are now with the school waiting to be installed in the next few weeks at which point data will start streaming in. While the students are busy being the scientists I need to get on with planning a data analysis session that we can run before the summer holidays. Overall I’m really pleased with how the session went and look forward to going back into the school soon.  

Panton Fellow Update: Introduction to Open Research Data

- May 5, 2014 in Panton Fellowships, Panton Principles

In my first three-month update report report I discussed the book I’m working on as the major output of my Panton Fellowship. Entitled Introduction to Open Research Data, the book explores both the practical and theoretical issues associated with Open Data from a range of general and disciplinary viewpoints. The book will be Open Access, available in various ebook formats and low-cost print editions, and remixing will be encouraged – particularly the subject-specific guidance, which disciplinary communities can build upon as a foundation for a collection of resources on Open Data. Whilst I am still awaiting a couple of contributions, I am happy to be able to share a provisional table of contents for the book. (Chapter topics on the left and authors on the right . Chapter titles still TBD):
  1. Foreword: Introduction to the Panton Fellowships
  2. Introduction to the book and the Panton Principles – Sam Moore (with input from the original Panton group)
  3. Open Content Mining – Peter Murray-Rust and Jenny Molloy
  4. Open Data and Neoliberalism – Eric Kansa
  5. Data Sharing in a Humanitarian Organization: The Experience of Médecins Sans Frontières – Unni Karunakara (previous published in PLOS Medicine)
  6. Open Data in Earth/Climate Sciences – Sarah Callaghan
  7. Open Data in Psychology – Wouter van den Bos, Mirjam Jenny and Dirk Wulff
  8. Digital Humanities and Linked Open Data  – Jodi Schneider
  9. Open Data in Palaeontology – Ross Mounce
  10. Open Data in the Health Sciences  – Tom Pollard
  11. Open Data in Economics – Velichka Dimitrova
  12. Why Open Drug Discovery Needs Four Simple Rules for Licensing Data and Models – Antony J. Williams, John Wilbanks and Sean Ekins (previously published in PLOS Computational Biology)
I won’t go into more detail about the content of each chapter, though authors were given free rein to approach the subject however they saw fit. Furthermore, I sought permission from the authors of the previously published pieces, though they were originally published under CC BY, and all were happy for their contributions to appear in the book. I’m super excited for how this is coming together and I hope to have the book published by August. I will of course be posting updates along the way. Get in touch if you have any questions!

A live AQ data feed- finally!

- February 19, 2014 in Panton Fellowships, Panton Principles

As anyone who has ever done lab work will know, it always takes longer than you expect! Well that’s definitely the case with my sensor calibration experiments. We have got there eventually though and the calibration is happening this week. So while all of the delays were happening there I decided to get a webpage sorted that I can use as a live data feed for the sensors and also somewhere to download the data. Version 1 of my webpage can be found here. It definitely needs a bit more work but it currently shows data from the last three days and will soon have a way of downloading the data directly. Whilst we’re in the calibration stage the data might look a little strange but I’ll be putting updates on the webpage regularly and will blog when the sensor is installed in the school and is collecting data. In the next few weeks I’m planning to visit the school that I am working with to decide on a deployment location with the pupils. Both the school and I want the pupils involved in the science as much as possible and so they will be helping me to pick the best location for the sensor, to install the sensor, to take measurements and then to analyse it. We’re hoping that this level of involvement will not only help to keep the pupils engaged but will also teach them what it’s like to be real scientists. The second facet to my work is the general public engagement aspect. I’m hoping to engage with members of the public who live or work close to the monitoring site to make them aware of the air that they breathe. This will probably start with the parents of the pupils involved in the project but will hopefully expand from there. I’ve definitely reached an exciting point in my project now so watch out for updates…

An Update on my Panton Fellowship

- January 8, 2014 in Panton Fellowships, Panton Principles

So as month four of my Fellowship begins it’s time to recap and reflect on what I’ve done so far and what’s left for me to still do… Over the last three months I’ve met and spoken to lots of interesting people, the world of open science/open data is very new to me and so making these contacts has been invaluable. So what else have I managed to achieve? A lot of the first few months was spent sourcing the right sensors for this project and then getiting them to work. As of the week before Christmas I have a working sensor which now needs calibrating and then it’s time for it to be deployed (yippee!).  I’ve been working with an MChem student and other colleagues on a calibration plan which can be used, not only for my sensors, but for the large selection of different ones we are now building up. We’re planning to run calibrations this month and then install the sensor in the first school in February. As I’ve mentioned before, the sensors final destination will be at a school in Leicester and so I have also been in contact with potential schools and have had a great response. The first school I’ll be working with is based just outside of Leicester and they are as excited as me about this project. We’re planning some introductory sessions for the school, outlining the project to pupils and then some data analysis sessions every term to look at the data with pupils and get them really thinking about what they are measuring. Not only will this be a great way of teaching them about air quality issues but will also reinforce certain areas of the curriculum too. Alongside of this I have been involved in the development of some “homemade” air quality sensors which we are hoping to deploy in Leicester this year.  This design is looking to be far cheaper than any currently on the market and the first prototype will be ready for testing next week. So it’s been a busy few months I’ve passed my PhD viva, started a new job and my Panton Fellowship but it’s been great and I’m really looking forward to see what the next three will have in store. My previous blog posts can also been found on the links below: http://science.okfn.org/2013/10/03/a-quick-hello-from-a-panton-fellow/ http://science.okfn.org/2013/11/01/my-first-month-as-a-panton-fellow/ http://science.okfn.org/2013/12/11/citizen-science-project-for-air-quality-measurements/  

Panton Fellow Update: Samuel Moore

- January 8, 2014 in Panton Fellowships, Panton Principles, Publications

My first few months as a Panton Fellow have flown by and so I wanted to provide a quick update on the work I’ve been doing. Whilst it’s not possible to discuss everything, I thought it would be good to list some of the larger projects I’ve been working on. Early into the fellowship I made contact with two of the Open Economics Working Group coordinators, Velichka Dimitrova and Sander Van Der Waal, to discuss how best to encourage Open Data in economics. Whilst we thought that a data journal could be a good way of incentivising data sharing, we also thought it would be sensible to conduct a survey of economists and their data sharing habits to see if our assumptions were correct. This will give us some firm evidence of the best way to advocate for Open Data in economics. The results will be released when they are available. Staying within the OKFN framework, I also helped kick-start the Open Humanities Group back into action in a meeting with the organisers and a post to the discussion list (posing the question: What does Open Humanities research data mean to you?). As a humanities researcher myself I am very keen to see the humanities embrace a more open approach to scholarship and it’s great to see a resurgence of activity here. So far this has resulted in a forthcoming Open Literature Sprint on January 25th in London. This sprint will build upon some of the work already completed on the Open Literature and Textus projects for collaborating, analysing and sharing open access and public domain works of literature and philosophy. Whilst I cannot take any credit for organising the event, I will certainly be in attendance and I encourage all those interested in Open Humanities research/data to attend too. We are looking for coders, editors and textfinders for the event – absolutely no technical skills required! You can sign up to attend here. However, the majority of my time has been spent working on a book: An Introduction to Open Research Data. This edited volume will feature chapters by Open Data experts in a range of academic disciplines, covering practical information on licensing, ethics, and information for data curators, alongside more theoretical issues surrounding the adoption of Open Data. As the book will be Open Access, each chapter will be able to standalone from the main volume so communities can host, distribute and remix the content that is relevant to them (the book will also be available in print). The table of contents is near enough finalised and the contributions are currently being written. I’m hoping the volume will be ready by August but watch this space! Do get in touch if you’ve any questions at all. In addition, here is a round-up of the blogposts I’ve written so far: On the Harvard Dataverse Network Project – an open-source tool for data sharing What are the incentives for data sharing? Panton Fellow Introduction: Samuel Moore  

Citizen Science Project for Air Quality Measurements

- December 11, 2013 in External Meetings, Panton Principles

photo

Chemistry themed lunch!

I have spent the last two days at a meeting run by the Automation and Analytical Management Group (AAMG) of the Royal Society of Chemistry.  As well as being a lovely meeting location (the RSC building Burlington house isn’t your average conference centre- dessert was served in beakers and the rooms are beautiful) the meeting itself has been very interesting. With topics of talks ranging from new air quality monitoring techniques to the latest deployment of networks of sensors to exciting new citizen science projects and the future of air quality monitoring.
The iSPEX add-on being used to measure aerosol properties

The iSPEX add-on being used to measure aerosol properties

It was this final topic that really caught my attention, a project called iSPEX originating in the Netherlands.  iSPEX is an add-on for your iPhone which allows the user to take measurements of the properties of aerosols. This project is currently being piloted in the Netherlands and has had some great success. On the first national iSPEX measurement day more than 5000 measurements were collected all over the Netherlands. This brilliant response shows the interest that can be generated by citizen science air quality projects. I personally cannot wait for this project to be extended to other countries as well because I think it’s gadgets like this that really will start to make some headway towards increasing public interest in Air Quality. Other projects discussed included installing a large network of low-cost air quality sensors at Heathrow airport and another project from Leicester where air quality outreach is also being pushed through funding from the RSC. Overall a very positive meeting demonstrating the interest in networks of monitors and citizen science concepts.

On the Harvard Dataverse Network Project (and why it’s awesome)

- December 10, 2013 in Panton Principles, tools

I am a huge fan of grass-roots approaches to scholarly openness. Successful community-led initiatives tend to speak directly to that community’s need and can grow by attracting interest from members on the fringes (just look at the success of the arXiv, for example). But these kinds of projects tend to be smaller scale and can be difficult to sustain, especially without any institutional backing or technical support. This is why the Harvard Dataverse Network is so great: it facilitates research data sharing through a sustainable, scalable, open-source platform maintained by the Institute for Quantitative Social Sciences at Harvard. This means it is sustainable through institutional backing, but also empowers individual communities to manage their own research data.
In essence, a Dataverse is simply a data repository, but one that is both free to use and fully customisable according to a community’s need. In the project’s own words:  
‘A Dataverse is a container for research data studies, customized and managed by its owner. A study is a container for a research data set. It includes cataloging information, data files and complementary files.’

(http://thedata.harvard.edu/dvn/)

  There are a number of ways in which the Dataverse Network can be used to enable Open Data. Journals A Dataverse can be a great way of incentivising data deposition among journal authors, especially when coupled with journal policies of mandating Open Data for all published articles. Here, a journal’s editor or editorial team would maintain the Dataverse itself, including its look and feel, which would instil confidence in authors that the data is in trusted hands. In fact, for journals housed on Open Journal Systems, there will soon be a plugin launched that directly links the article submission form with the journal’s Dataverse. And so, from an author’s perspective, the deposition of data will be as seamless as submitting a supporting information file. This presentation [pdf] goes into the plugin in more detail (and provides more info on the Dataverse project itself). (Sub-)Disciplines There are some disciplines that simply do not have their own subject-specific repository and so a Dataverse would be great for formalising and incentivising Open Data here. In many communities, datasets are uploaded to general repositories (Figshare, for example) that may not be tailored to their needs. Although this isn’t a problem – it’s great that general repositories exist – a discipline-maintained repository would automatically confer a level of reputation sufficient to encourage others to use it. What’s more, different communities have different preservation/metadata needs that general repositories might not be able to offer, and so the Dataverse could be tailored exactly to that community’s need. Individuals Interestingly, individuals can have their own Dataverses for housing all their shared research data. This could be a great way of allowing researchers to showcase their openly available datasets (and perhaps research articles too) in self-contained collections. The Dataverse could be linked to directly from a CV or institutional homepage, offering a kind of advertisment for how open a scholar one is. Furthermore, users can search across all Dataverses for specific keywords, subject areas, and so on, so there is no danger of being siloed off from the broader community. So the Dataverse Network is a fantastic project for placing the future of Open Data in the hands of researchers and it would be great to see it adopted by scholarly communities throughout the world.  

Open and transparent altmetrics for discovery

- December 9, 2013 in altmetrics, knowledge discovery, openness, Panton Principles, research, tools, Transparency

6795008004_8046829553

by AG Cann

Altmetrics are a hot topic in scientific community right now. Classic citation-based indicators such as the impact factor are amended by alternative metrics generated from online platforms. Usage statistics (downloads, readership) are often employed, but links, likes and shares on the web and in social media are considered as well. The altmetrics promise, as laid out in the excellent manifesto, is that they assess impact quicker and on a broader scale. The main focus of altmetrics at the moment is evaluation of scientific output. Examples are the article-level metrics in PLOS journals, and the Altmetric donut. ImpactStory has a slightly different focus, as it aims to evaluate the oeuvre of an author rather than an individual paper. This is all good and well, but in my opinion, altmetrics have a huge potential for discovery that goes beyond rankings of top papers and researchers. A potential that is largely untapped so far. How so? To answer this question, it is helpful to shed a little light on the history of citation indices.

Pathways through science

In 1955, Eugene Garfield created the Science Citation Index (SCI) which later went on to become the Web of Knowledge. His initial idea – next to measuring impact – was to record citations in a large index to create pathways through science. Thus one can link papers that are not linked by shared keywords. It makes a lot of sense: you can talk about the same thing using totally different terminology, especially when you are not in the same field. Furthermore, terminology has proven to be very fluent even in the same domain (Leydesdorff 1997). In 1973, Small and Marshakova realized – independently from each other – that co-citation is a measure of subject similarity and therefore can be used to map a scientific field. Due to the fact that citations are considerably delayed, however, co-citation maps are often a look into the past and not a timely overview of a scientific field.

Altmetrics for discovery

In come altmetrics. Similarly to citations, they can create pathways through science. After all, a citation is nothing else but a link to another paper. With altmetrics, it is not so much which papers are often referenced together, but rather which papers are often accessed, read, or linked together. The main advantage of altmetrics, as with impact, is that they are much earlier available.
clickstream_map

Bollen et al. (2009): Clickstream Data Yields High-Resolution Maps of Science. PLOS One. DOI: 10.1371/journal.pone.0004803.

One of the efforts in this direction is the work of Bollen et al. (2009) on click-streams. Using the sequences of clicks to different journals, they create a map of science (see above). In my PhD, I looked at the potential of readership statistics for knowledge domain visualizations. It turns out that co-readership is a good indicator for subject similarity. This allowed me to visualize the field of educational technology based on Mendeley readership data (see below). You can find the web visualization called Head Start here and the code here (username: anonymous, leave password blank).
headstart

http://labs.mendeley.com/headstart

Why we need open and transparent altmetrics

The evaluation of Head Start showed that the overview is indeed more timely than maps based on citations. It, however, also provided further evidence that altmetrics are prone to sample biases. In the visualization of educational technology, the computer science driven areas such as adaptive hypermedia are largely missing. Bollen and Van de Sompel (2008) reported the same problem when they compared rankings based on usage data to rankings based on the impact factor. It is therefore important that altmetrics are transparent and reproducible, and that the underlying data is openly available. This is the only way to ensure that all possible biases can be understood. As part of my Panton Fellowship, I will try to find datasets that satisfy these criteria. There are several examples of open bibliometric data, such as the Mendeley API, and figshare API that have adopted CC BY, but most of the usage data is not available publicly or cannot be redistributed. In my fellowship, I want to evaluate the goodness of fit of different open altmetrics data. Furthermore, I plan to create more knowledge domain visualizations such as the one above. So if you know any good datasets please leave a comment below. Of course any other comments on the idea are much appreciated as well.

What are the incentives for data sharing?

- November 5, 2013 in Open Data, Panton Principles

I have argued elsewhere that researchers should embrace scholarly openness because of the disciplinary benefits it affords. Specifically, and as is widely argued, Open Data ensures that research can be verified through replication and reused to pose and help answer new questions. Furthermore, in the humanities, Open Data can also contribute to the cultural commons, especially through initiatives such as the DPLA and Europeana. Open Data thus helps research move to more of an economy of sharing, rather than one of mere competition. But the truth is that academia can be a ruthless area to work in and holding onto data is one way that researchers in some disciplines try to maintain a competitive advantage over their peers. For example, I recently spoke with a public health researcher who told me that she wouldn’t share any of her data until she had completely exhausted its potential for publications, which could take years. After that, she admitted she would have probably moved on to other things and the data would be forgotten about. Whilst this anecdote reflects the practices of only one researcher, I suspect that it reflects common practice for many researchers. Data sharing therefore needs incentives, tangible rewards for individuals that work within the current system to encourage researchers to open up their data for the wider community. Of course, mandates are important too, although they can be a blunt instrument without broad community support. What, therefore, is the best way to reward data deposition and build community momentum behind Open Data? Three ways spring to mind: Data citation The most obvious way to incentivise Open Data is to ensure that data creators are formally credited for their contribution through the use of citations. Adopting a standardised mechanism for citing data will recognise/reward data creators and help track the impact of individual datasets. DataCite suggests the following structure for citing a dataset:
Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier

Source: http://www.datacite.org/whycitedata

Nevertheless, data citation is a new and undeveloped concept, and the practicalities are still to be fully worked out. The following report by the CODATA-ICSTI Task Group on Data Citation Standards and Practices goes into more detail on these issues: ‘Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data’. New collaborations Data sharing can of course lead to new collaborations with other researchers, either those looking to build upon pre-existing datasets or to group together to collect new data. In many ways, data sharing is an advertisement for the kind of work a researcher is doing – not just the subject expertise, but methodological expertise too – and is a statement that one is open to sharing/collaboration. This approach is particularly prevalent in the digital humanities, which is often seen to set itself apart for its collaborative approach to scholarship (see Digital Humanities Questions & Answers for an example of this collaborative approach). As the field is in its relative infancy, many digital humanists are self-taught according to their individual needs and so there isn’t a methodological canon that researchers are taught, which makes collaborating and sharing skillsets an attractive prospect. Perception of rigour     As Wicherts et al. demonstrated, there is a correlation between a willingness to share data and the quality of statistical reporting in psychology. Although this is only a correlation, the argument here is that researchers may take more care over the quality and presentation of their data when they have committed to sharing it, and so researchers who routinely share data can build up a reputation for scholarly rigour. Obviously this incentive is less tangible than the previous two, but it is still worth mentioning that Open Data, and openness in general, can contribute to the overall positive reputation of a researcher. These appear to me to be the immediately obvious incentives for the average researcher to share their data, and as a Panton Fellow I’m looking to explore these further this year. I would be interested to read any I’ve missed!  

“It’s not only peer-reviewed, it’s reproducible!”

- October 18, 2013 in Open Data, Open Source, Panton Principles, peer-review, publication process, quality, Reproducibility, reproducible

Peer review is one of the oldest and most respected instruments of quality control in science and research. Peer review means that a paper is evaluated by a number of experts on the topic of the article (the peers). The criteria may vary, but most of the time they include methodological and technical soundness, scientific relevance, and presentation. “Peer-reviewed” is a widely accepted sign of quality of a scientific paper. Peer review has its problems, but you won’t find many researchers that favour a non peer-reviewed paper over a peer-reviewed one. As a result, if you want your paper to be scientifically acknowledged, you most likely have to submit it to a peer-reviewed journal. Even though it will take more time and effort to get it published than in a non peer-reviewed publication outlet. Peer review helps to weed out bad science and pseudo-science, but it also has serious limitations. One of these limitations is that the primary data and other supplementary material such as documentation source code are usually not available. The results of the paper are thus not reproducible. When I review such a paper, I usually have to trust the authors on a number of issues: that they have described the process of achieving the results as accurate as possible, that they have not left out any crucial pre-processing steps and so on. When I suspect a certain bias in a survey for example, I can only note that in the review, but I cannot test for that bias in the data myself. When the results of an experiment seem to be too good to be true, I cannot inspect the data pre-processing to see if the authors left out any important steps. As a result, later efforts in reproducing research results can lead to devastating outcomes. Wang et al. (2010) for example found that they could not reproduce almost all of the literature on a certain topic in computer science.

“Reproducible”: a new quality criterion

Needless to say this is not a very desirable state. Therefore, I argue that we should start promoting a new quality criterion: “reproducible”. Reproducible means that the results achieved in the paper can be reproduced by anyone because all of the necessary supplementary resources have been openly provided along with the paper. It is easy to see why a peer-reviewed and reproducible paper is of higher quality than just a peer-reviewed one. You do not have to take the researchers’ word of how they calculated their results – you can reconstruct them yourself. As a welcome side-effect, this would make more datasets and source code openly available. Thus, we could start building on each others’ work and aggregate data from different sources to gain new insights. In my opinion, reproducible papers could be published alongside non-reproducible papers, just like peer-reviewed articles are usually published alongside editorials, letters, and other non peer-reviewed content. I would think, however, that over time, reproducible would become the overall quality standard of choice – just like peer-reviewed is the preferred standard right now. To help this process, journals and conferences could designate a certain share of their space to reproducible papers. I would imagine that they would not have to do that for too long though. Researchers will aim for a higher quality standard, even if it takes more time and effort. I do not claim that reproducibility solves all of the problems that we see in science and research right now. For example, it will still be possible to manipulate the data to a certain degree. I do, however, believe that reproducibility as an additional quality criterion would be an important step for open and reproducible science and research. So that you can say to your colleague one day: “Let’s go with the method described in this paper. It’s not only peer-reviewed, it’s reproducible!”