Open In Colab

9. Bibliografia e Recursos


9.1. Bibliografia

  1. Kieran Healy. Data Visualization: A Practical Introduction (2019)

  2. Claus O. Wilke. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures (2019) Disponível em: https://clauswilke.com/dataviz/ Acesso: 17.05.2021

  3. Jake VanderPlas. Python Data Science Handbook. O’Reilly Media, Inc. (2016). ISBN: 9781491912058. Disponível em: https://jakevdp.github.io/PythonDataScienceHandbook/. Acesso: 17.05.2021

  4. Andy Kirk. Data Visualisation A Handbook for Data Driven Design. SAGE Publications Inc. (2019) ISBN 978-1-5264-6893-2.

  5. Edward R. Tufte. The Visual Display of Quantitative Information Hardcover (2001)

  6. Robert Johansson. Numerical Python - Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib (2018) ISBN 978-1-484242-45-2. Book code site: https://nbviewer.jupyter.org/github/jrjohansson/numerical-python-book-code/tree/master/

  7. Nicolas P. Rougier ,Michael Droettboom,Philip E. Bourne. Ten Simple Rules for Better Figures (2014) https://doi.org/10.1371/journal.pcbi.1003833 Disponível em: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833. Acesso: 17.05.2021

  8. James R. Beniger & Dorothy L. Robyn (1978)** Quantitative Graphics in Statistics: A Brief History, The American Statistician**, 32:1, 1-11, DOI: 10.1080/00031305.1978.10479235

  9. Willard C. Brinton (1918) Graphic Methods for Presenting Facts, Ronald Press Company, Disponível: https://archive.org/details/graphicmethodsfo00will, Acesso: 2021/05/01.

  10. William S. Cleveland & Robert McGill (1984)** Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods**, Journal of the American Statistical Association, 79:387, 531-554, DOI: 10.1080/01621459.1984.10478080

  11. ___. The Radar Chart and Its Caveats Disponível em: https://www.data-to-viz.com/caveat/spider.html Acesso: 05/05/2021

9.2. Livros

Outros livros úteis sobre visualização de dados.

  1. Cole Nussbaumer Knaflic. Storytelling with Data: A Data Visualization Guide for Business Professionals Paperback (2015)

  2. Danyel Fisher, Miriah Meyer. Making Data Visual: A Practical Guide to Using Visualization for Insight (2017)

  3. Samuel Burns. Python Data Visualization: An Easy Introduction to Data Visualization in Python with Matplotlip, Pandas, and Seaborn (2019)

  4. McKinney Wes. Python for data analysis (1st. ed.). O’Reilly Media, Inc. (2012)

9.3. Galerias de Gráficos

Esses site trazem exemplos e definições de gráficos muitas vezes acompanhadas de exemplos de código.

  1. https://www.data-to-viz.com/. Traz uma classificação de tipos de gráfico com base no formato de dados de entrada (dados numéricos, categóricos etc.) e apresenta uma árvore de decisão que leva a um conjunto de visualizações potencialmente mais adequadas. Aprensenta ainda exemplos em Python, R e D3.js.

  2. https://www.python-graph-gallery.com/. Uma biblioteca com centenas exemplos de gráficos produzidos em Python. Os gráficos estã organizados em cerca de 40 seções e sempre vêm com um código exemplo associado empregando principalmente com Matplotlib, mas também os pacotes Seaborn e Plotly.

  3. https://datavizproject.com/. Uma biblioteca de diferentes tipos gráficos. Os gráficos podem ser também selecionados por função. Não há códigos, mas fornece para cada visualização links para ferramentas de visualização e exemplos.

  4. https://datavizcatalogue.com/. Uma biblioteca de diferentes tipos de visualização de dados e informação. Não há códigos, mas fornece para cada visualização links para ferramentas de visualização e exemplos.

9.4. Pacotes e Software

Relação de sites de Pacotes e Softwares úteis, empregados aqui.

  1. Python https://www.python.org/. Programming language, API e Docs.

  2. NumpPy https://numpy.org/. Package for scientific computing with Python, API e Docs.

  3. Pandas https://pandas.pydata.org/. Data analysis and manipulation tool, API e Docs.

  4. Matplotlib https://matplotlib.org/ Visualization with Python, API e Docs.

  5. Seaborn https://seaborn.pydata.org/ Statistical Data Visualization, API e Docs.

  6. SciPy https://www.scipy.org/ Python Scientific eco-system, API e Docs.

  7. Scikit-learn https://scikit-learn.org/ Machine Learning in Python, API e Docs.

  8. Statsmodels https://www.statsmodels.org/ statistical models, hypothesis tests, and data exploration, API e Docs.

  9. Anaconda https://www.anaconda.com/products/individual) Um eco sistema profissional de desenvolvimento Python que inclui, além da linguagem Python, um ambiente Jupyter https://jupyter.org/ para edição de notebooks local, o IDE Spyder https://www.spyder-ide.org/ dentre outros pacotes.

9.5. Tutoriais e Consultas Rápidas

  1. w3Schools https://www.w3schools.com/ Tutoriais básicos de Python, Numpy, Pandas e Matplotlib dentre outros.

  2. Pandas Cheat Sheet. https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

  3. Matplotlib Cheat Sheet. https://github.com/rougier/matplotlib-cheatsheet

  4. Charting in Colaboratory https://colab.research.google.com/notebooks/charts.ipynb Gráficos em comandos básicos no Google Colaboratory.

  5. Scipy Lecture Notes One document to learn numerics, science, and data with Python, http://scipy-lectures.org

  1. Selva Prabhakaran. Top 50 matplotlib Visualizations – The Master Plots (with full python code) https://www.machinelearningplus.com/plots/top-50-matplotlib-visualizations-the-master-plots-python/

9.6. Miscelânea

  1. Video Kieran Healy Principles of Data Visualization https://youtu.be/wHrzsO564uA.

  2. Open Course Coursera Open Course from University of Michigan Applied Plotting, Charting & Data Representation in Python https://www.coursera.org/learn/python-plotting/home/welcome.

  3. Google Colab https://colab.research.google.com/ Ambiente web para edição e execução de Python Jupyter notebooks, incluindo tutorais de uso de Jupyter notebooks, Markdown etc.

9.7. Versões de Software

Versões dos principais pacotes empregados para reprodução dos códigos deste livro.

!pip install version_information
Requirement already satisfied: version_information in c:\users\user\anaconda3\lib\site-packages (1.0.3)
import numpy as np 
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
import statsmodels
%matplotlib inline

%reload_ext version_information
%version_information numpy, pandas, matplotlib, seaborn, scipy, statsmodels
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
c:\users\user\anaconda3\lib\site-packages\IPython\core\formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

c:\users\user\anaconda3\lib\site-packages\version_information\version_information.py in _repr_html_(self)
    126         html += "<tr><th>Software</th><th>Version</th></tr>"
    127         for name, version in self.packages:
--> 128             _version = cgi.escape(version)
    129             html += "<tr><td>%s</td><td>%s</td></tr>" % (name, _version)
    130 

AttributeError: module 'cgi' has no attribute 'escape'
\[\begin{split}\begin{tabular}{|l|l|}\hline {\bf Software} & {\bf Version} \\ \hline\hline Python & 3.8.5 64bit [MSC v.1916 64 bit (AMD64)] \\ \hline IPython & 7.19.0 \\ \hline OS & Windows 10 10.0.19041 SP0 \\ \hline numpy & 1.19.2 \\ \hline pandas & 1.1.3 \\ \hline matplotlib & 3.3.2 \\ \hline seaborn & 0.11.0 \\ \hline scipy & 1.5.2 \\ \hline statsmodels & 0.12.0 \\ \hline \hline \multicolumn{2}{|l|}{Mon May 24 00:26:20 2021 Hora oficial do Brasil} \\ \hline \end{tabular}\end{split}\]

9.8. Arquivos de Dados Empregados

Todos os arquivos empregados neste livro foram extraídos ou adaptados de dados públicos replicados no site para garantia de reprodutibilidade dos programas.

import pandas as pd
url = 'https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/'

lista = pd.read_csv(url + '_datasets_info.csv')

for dataset in lista.dataset:
  print('dataset: ' + dataset)
  print('URL    : ' + url + dataset)
  print('Source : ' + lista[ lista.dataset == dataset ].source.values[0])
  print('\n')
  df = pd.read_csv(url + dataset)
  display(df.head())
  print('\n')
dataset: mtcars.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/mtcars.csv
Source : R package, https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html
model mpg cyl disp hp drat wt qsec vs am gear carb
0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
dataset: mystocksn.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/mystocksn.csv
Source : A partir do Yahoo Finance API, https://finance.yahoo.com/
data IBOV VALE3 PETR4 DOLAR
0 2020-02-27 102984.0 9.92 12.08 4.4491
1 2020-02-28 104172.0 9.82 12.10 4.4848
2 2020-03-02 106625.0 10.27 12.49 4.4413
3 2020-03-03 105537.0 10.22 12.16 4.4724
4 2020-03-04 107224.0 10.56 12.33 4.5132
dataset: Life_Expectancy_Data.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/Life_Expectancy_Data.csv
Source : WHO World Health Organization, https://www.who.int/
Country Year Status Life expectancy Adult Mortality infant deaths Alcohol percentage expenditure Hepatitis B Measles ... Polio Total expenditure Diphtheria HIV/AIDS GDP Population thinness 1-19 years thinness 5-9 years Income composition of resources Schooling
0 Afghanistan 2015 Developing 65.0 263.0 62 0.01 71.279624 65.0 1154 ... 6.0 8.16 65.0 0.1 584.259210 33736494.0 17.2 17.3 0.479 10.1
1 Afghanistan 2014 Developing 59.9 271.0 64 0.01 73.523582 62.0 492 ... 58.0 8.18 62.0 0.1 612.696514 327582.0 17.5 17.5 0.476 10.0
2 Afghanistan 2013 Developing 59.9 268.0 66 0.01 73.219243 64.0 430 ... 62.0 8.13 64.0 0.1 631.744976 31731688.0 17.7 17.7 0.470 9.9
3 Afghanistan 2012 Developing 59.5 272.0 69 0.01 78.184215 67.0 2787 ... 67.0 8.52 67.0 0.1 669.959000 3696958.0 17.9 18.0 0.463 9.8
4 Afghanistan 2011 Developing 59.2 275.0 71 0.01 7.097109 68.0 3013 ... 68.0 7.87 68.0 0.1 63.537231 2978599.0 18.2 18.2 0.454 9.5

5 rows × 22 columns

dataset: gapminder_2015.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/gapminder_2015.csv
Source : Gapminder, https://www.gapminder.org/
continent country year demox_eiu income_per_person invest_%_gdp tax_%_gdp gini_index LifeExpect HappyIdx SchoolYears15_24 VacineBelieve ChildMortality Co2Emissions CPI Population
0 Africa Botswana 2015 78.7 15700 32.1 24.7 60.5 66.9 0.376 8.40 NaN 40.7 2.560 63.0 2120000
1 Africa Burkina Faso 2015 47.0 1600 24.3 15.1 35.5 60.7 0.442 3.76 NaN 86.8 0.182 38.0 18100000
2 Africa Cote d'Ivoire 2015 33.1 3230 20.1 15.4 41.6 61.0 0.445 6.59 NaN 90.0 0.405 32.0 23200000
3 Africa Egypt 2015 31.8 10200 14.3 12.5 31.2 70.2 0.476 10.60 NaN 23.6 2.370 36.0 92400000
4 Africa Kenya 2015 53.3 2800 21.5 16.3 41.5 64.7 0.436 9.06 NaN 46.3 0.341 25.0 47900000
dataset: tips.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/tips.csv
Source : Seaborn package, https://github.com/mwaskom/seaborn-data/blob/master/tips.csv
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
dataset: sp500_ibov.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/sp500_ibov.csv
Source : A partir do Yahoo Finance API, https://finance.yahoo.com/
Date SP500 IBOV
0 2020-01-02 3257.850098 118573.0
1 2020-01-03 3234.850098 117707.0
2 2020-01-06 3246.280029 116878.0
3 2020-01-07 3237.179932 116662.0
4 2020-01-08 3253.050049 116247.0
dataset: T1_reshape.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/T1_reshape.csv
Source : Kaggle, adaptado de https://www.kaggle.com/berkerisen/wind-turbine-scada-dataset
WindSpeed(m/s) ActivePower(kW) Theoretical_Power_Curve (KWh) Loss_Value(kW) Loss(%) count direction
0 3.5 43.46 70.58 27.12 38.02 29 N
1 4.0 88.01 127.57 39.56 31.60 101 N
2 4.5 160.51 217.01 56.50 26.42 102 N
3 5.0 274.71 335.67 60.96 18.44 99 N
4 5.5 388.60 465.84 77.24 16.79 119 N
dataset: T1_reshape_mean.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/T1_reshape_mean.csv
Source : Kaggle, adaptado de https://www.kaggle.com/berkerisen/wind-turbine-scada-dataset
WindSpeed(m/s) ActivePower(kW) Theoretical_Power_Curve (KWh) Loss_Value(kW) Loss(%) count
0 3.5 52.191667 68.939167 16.745833 23.584167 56.250000
1 4.0 97.756667 127.686667 29.929167 23.646667 144.666667
2 4.5 183.601667 221.930000 38.327500 17.488333 149.500000
3 5.0 286.590833 336.426667 49.835000 14.937500 143.750000
4 5.5 400.392500 470.090833 69.699167 14.838333 167.666667
dataset: corona_Brasil.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/corona_Brasil.csv
Source : Corona Vírus Brasil, https://covid.saude.gov.br
regiao estado municipio coduf codmun codRegiaoSaude nomeRegiaoSaude data semanaEpi populacaoTCU2019 casosAcumulado casosNovos obitosAcumulado obitosNovos Recuperadosnovos emAcompanhamentoNovos interior/metropolitana
0 Brasil NaN NaN 76 NaN NaN NaN 2020-02-25 9.0 210147125.0 0.0 0.0 0.0 0.0 NaN NaN NaN
1 Brasil NaN NaN 76 NaN NaN NaN 2020-02-26 9.0 210147125.0 1.0 1.0 0.0 0.0 NaN NaN NaN
2 Brasil NaN NaN 76 NaN NaN NaN 2020-02-27 9.0 210147125.0 1.0 0.0 0.0 0.0 NaN NaN NaN
3 Brasil NaN NaN 76 NaN NaN NaN 2020-02-28 9.0 210147125.0 1.0 0.0 0.0 0.0 NaN NaN NaN
4 Brasil NaN NaN 76 NaN NaN NaN 2020-02-29 9.0 210147125.0 2.0 1.0 0.0 0.0 NaN NaN NaN
dataset: flights_delays_2015.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/flights_delays_2015.csv
Source : Kaggle, adaptado de https://www.kaggle.com/usdot/flight-delays
arr_delay name
0 11.0 United Air Lines Inc.
1 20.0 United Air Lines Inc.
2 12.0 United Air Lines Inc.
3 7.0 United Air Lines Inc.
4 -14.0 United Air Lines Inc.
dataset: glassdoordata.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/glassdoordata.csv
Source : Glassdoor, a partir de https://www.glassdoor.com.br/
jobtitle gender age performance education department seniority income bonus
0 Graphic Designer Female 18 5 College Operations 2 42363 9938
1 Software Engineer Male 21 5 College Management 5 108476 11128
2 Warehouse Associate Female 19 4 PhD Administration 5 90208 9268
3 Software Engineer Male 20 5 Masters Sales 4 108080 10154
4 Graphic Designer Male 26 5 Masters Engineering 5 99464 9319
dataset: sp500_ibov_2000.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/sp500_ibov_2000.csv
Source : A partir do Yahoo Finance API, https://finance.yahoo.com/
Date SP500 IBOV YYYY-MM
0 2000-01-03 1455.219971 16930.0 2000-01
1 2000-01-04 1399.420044 15851.0 2000-01
2 2000-01-05 1402.109985 16245.0 2000-01
3 2000-01-06 1403.449951 16107.0 2000-01
4 2000-01-07 1441.469971 16309.0 2000-01
dataset: energy_types.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/energy_types.csv
Source : R for Data Science Online Learning Community, https://github.com/rfordatascience
country country_name type level 2016 2017 2018
0 BE Belgium Conventional thermal Level 1 30728.0 31316.0 30092.635
1 BE Belgium Nuclear Level 1 41430.0 40128.5 26995.628
2 BE Belgium Hydro Level 1 1476.0 1360.9 1239.248
3 BE Belgium Pumped hydro power Level 2 1110.0 1093.2 983.190
4 BE Belgium Wind Level 1 5340.0 6387.9 7177.346
dataset: nightingale.csv
URL    : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/nightingale.csv
Source : Datasets for Visualization Construction, https://visdatasets.github.io/index.html
Date Month Year Army Disease Wounds Other Disease.rate Wounds.rate Other.rate
0 1854-04-01 Apr 1854 8571 1 0 5 1.4 0.0 7.0
1 1854-05-01 May 1854 23333 12 0 9 6.2 0.0 4.6
2 1854-06-01 Jun 1854 28333 11 0 6 4.7 0.0 2.5
3 1854-07-01 Jul 1854 28722 359 0 23 150.0 0.0 9.6
4 1854-08-01 Aug 1854 30246 828 1 30 328.5 0.4 11.9