Ecosystem of data tools for geologists

Free and open-source softwares (FOSS) for data science

Ecosystem of tools

Introduction

Geologists deal with data provided from Earth through measuring devices (e.g. compass, GPS, cameras, etc.) and we are constantly facing highly complex physical features and sparse datasets that can make us feel stuck and hopeless. Commercial softwares also don’t provide all the resources needed for problem solving. We have to process a broad range of types and sizes of structured and unstructured raw data.

In this article I will give a glimpse on some of the main tools that I find useful in my daily activities as a geologist and data scientist. On each one of the following items you will find a brief description and key links for you to go further into your topics of interest. It isn’t suppose to be a tutorial, I just pretend to give the reader some orientations in the data science world, based on my self-learning experience.

Python ecosystem

Python is the first tool that I am going to explain. It is a multi-proposal interpreted programming language created by Guido van Rossum (first released in 1991) that is extensively used by scientists, developers, software engineers and hackers. Useful links:

  • python.org the official web-site of the Python Software Foundation
  • anaconda.com which is a open-source platform for data science

In Python you can type in a command line and hit return key without the need of compile the code to see the results as:

You can find a diverse and extensive range of packages for Python at the Python Package Index website pypi.org that are useful for scientific and development tasks. There’s a list below with the most popular Python packages for scientific computing, data exploration and machine learning:

Jupyter notebooks

This is definitely the most popular tool for data exploration and modeling and you can find out more about it at jupyter.org

Jupyter notebooks allow us to write code and well formatted comments, provide us with a dynamic way of collaborating and sharing knowledge among the the community of scientists and software developers. Notebooks are built on iPython framework that flexible and can run different programming languages such as Python, R, Julia, C, Matlab, and many others. Each language is loaded as a kernel in the notebook.

R programming language

R is a programming language designed for statistics and scientific computing. You will find an extensive list of libraries that allow us to have access useful datasets and functions for data analysis in their repository CRAN. R community also provide us with high level books, articles and tutorials. Useful links:

Written on September 1, 2017