9. Bibliografia e Recursos¶
9.1. Bibliografia¶
Kieran Healy. Data Visualization: A Practical Introduction (2019)
Claus O. Wilke. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures (2019) Disponível em: https://clauswilke.com/dataviz/ Acesso: 17.05.2021
Jake VanderPlas. Python Data Science Handbook. O’Reilly Media, Inc. (2016). ISBN: 9781491912058. Disponível em: https://jakevdp.github.io/PythonDataScienceHandbook/. Acesso: 17.05.2021
Andy Kirk. Data Visualisation A Handbook for Data Driven Design. SAGE Publications Inc. (2019) ISBN 978-1-5264-6893-2.
Edward R. Tufte. The Visual Display of Quantitative Information Hardcover (2001)
Robert Johansson. Numerical Python - Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib (2018) ISBN 978-1-484242-45-2. Book code site: https://nbviewer.jupyter.org/github/jrjohansson/numerical-python-book-code/tree/master/
Nicolas P. Rougier ,Michael Droettboom,Philip E. Bourne. Ten Simple Rules for Better Figures (2014) https://doi.org/10.1371/journal.pcbi.1003833 Disponível em: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833. Acesso: 17.05.2021
James R. Beniger & Dorothy L. Robyn (1978)** Quantitative Graphics in Statistics: A Brief History, The American Statistician**, 32:1, 1-11, DOI: 10.1080/00031305.1978.10479235
Willard C. Brinton (1918) Graphic Methods for Presenting Facts, Ronald Press Company, Disponível: https://archive.org/details/graphicmethodsfo00will, Acesso: 2021/05/01.
William S. Cleveland & Robert McGill (1984)** Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods**, Journal of the American Statistical Association, 79:387, 531-554, DOI: 10.1080/01621459.1984.10478080
___. The Radar Chart and Its Caveats Disponível em: https://www.data-to-viz.com/caveat/spider.html Acesso: 05/05/2021
9.2. Livros¶
Outros livros úteis sobre visualização de dados.
Cole Nussbaumer Knaflic. Storytelling with Data: A Data Visualization Guide for Business Professionals Paperback (2015)
Danyel Fisher, Miriah Meyer. Making Data Visual: A Practical Guide to Using Visualization for Insight (2017)
Samuel Burns. Python Data Visualization: An Easy Introduction to Data Visualization in Python with Matplotlip, Pandas, and Seaborn (2019)
McKinney Wes. Python for data analysis (1st. ed.). O’Reilly Media, Inc. (2012)
9.3. Galerias de Gráficos¶
Esses site trazem exemplos e definições de gráficos muitas vezes acompanhadas de exemplos de código.
https://www.data-to-viz.com/. Traz uma classificação de tipos de gráfico com base no formato de dados de entrada (dados numéricos, categóricos etc.) e apresenta uma árvore de decisão que leva a um conjunto de visualizações potencialmente mais adequadas. Aprensenta ainda exemplos em
Python
,R
eD3.js
.https://www.python-graph-gallery.com/. Uma biblioteca com centenas exemplos de gráficos produzidos em Python. Os gráficos estã organizados em cerca de 40 seções e sempre vêm com um código exemplo associado empregando principalmente com
Matplotlib
, mas também os pacotesSeaborn
ePlotly
.https://datavizproject.com/. Uma biblioteca de diferentes tipos gráficos. Os gráficos podem ser também selecionados por função. Não há códigos, mas fornece para cada visualização links para ferramentas de visualização e exemplos.
https://datavizcatalogue.com/. Uma biblioteca de diferentes tipos de visualização de dados e informação. Não há códigos, mas fornece para cada visualização links para ferramentas de visualização e exemplos.
9.4. Pacotes e Software¶
Relação de sites de Pacotes e Softwares úteis, empregados aqui.
Python
https://www.python.org/. Programming language, API e Docs.NumpPy
https://numpy.org/. Package for scientific computing with Python, API e Docs.Pandas
https://pandas.pydata.org/. Data analysis and manipulation tool, API e Docs.Matplotlib
https://matplotlib.org/ Visualization with Python, API e Docs.Seaborn
https://seaborn.pydata.org/ Statistical Data Visualization, API e Docs.SciPy
https://www.scipy.org/ Python Scientific eco-system, API e Docs.Scikit-learn
https://scikit-learn.org/ Machine Learning in Python, API e Docs.Statsmodels
https://www.statsmodels.org/ statistical models, hypothesis tests, and data exploration, API e Docs.Anaconda
https://www.anaconda.com/products/individual) Um eco sistema profissional de desenvolvimentoPython
que inclui, além da linguagemPython
, um ambiente Jupyter https://jupyter.org/ para edição de notebooks local, o IDE Spyder https://www.spyder-ide.org/ dentre outros pacotes.
9.5. Tutoriais e Consultas Rápidas¶
w3Schools
https://www.w3schools.com/ Tutoriais básicos dePython
,Numpy
,Pandas
eMatplotlib
dentre outros.Pandas Cheat Sheet. https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Matplotlib Cheat Sheet. https://github.com/rougier/matplotlib-cheatsheet
Charting in Colaboratory https://colab.research.google.com/notebooks/charts.ipynb Gráficos em comandos básicos no Google Colaboratory.
Scipy Lecture Notes One document to learn numerics, science, and data with Python, http://scipy-lectures.org
Selva Prabhakaran. Top 50 matplotlib Visualizations – The Master Plots (with full python code) https://www.machinelearningplus.com/plots/top-50-matplotlib-visualizations-the-master-plots-python/
9.6. Miscelânea¶
Video Kieran Healy Principles of Data Visualization https://youtu.be/wHrzsO564uA.
Open Course Coursera Open Course from University of Michigan Applied Plotting, Charting & Data Representation in Python https://www.coursera.org/learn/python-plotting/home/welcome.
Google Colab https://colab.research.google.com/ Ambiente web para edição e execução de
Python
Jupyter notebooks, incluindo tutorais de uso de Jupyter notebooks, Markdown etc.
9.7. Versões de Software¶
Versões dos principais pacotes empregados para reprodução dos códigos deste livro.
!pip install version_information
Requirement already satisfied: version_information in c:\users\user\anaconda3\lib\site-packages (1.0.3)
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
import statsmodels
%matplotlib inline
%reload_ext version_information
%version_information numpy, pandas, matplotlib, seaborn, scipy, statsmodels
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
c:\users\user\anaconda3\lib\site-packages\IPython\core\formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
c:\users\user\anaconda3\lib\site-packages\version_information\version_information.py in _repr_html_(self)
126 html += "<tr><th>Software</th><th>Version</th></tr>"
127 for name, version in self.packages:
--> 128 _version = cgi.escape(version)
129 html += "<tr><td>%s</td><td>%s</td></tr>" % (name, _version)
130
AttributeError: module 'cgi' has no attribute 'escape'
9.8. Arquivos de Dados Empregados¶
Todos os arquivos empregados neste livro foram extraídos ou adaptados de dados públicos replicados no site para garantia de reprodutibilidade dos programas.
import pandas as pd
url = 'https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/'
lista = pd.read_csv(url + '_datasets_info.csv')
for dataset in lista.dataset:
print('dataset: ' + dataset)
print('URL : ' + url + dataset)
print('Source : ' + lista[ lista.dataset == dataset ].source.values[0])
print('\n')
df = pd.read_csv(url + dataset)
display(df.head())
print('\n')
dataset: mtcars.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/mtcars.csv
Source : R package, https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html
model | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
1 | Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
2 | Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
3 | Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
4 | Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
dataset: mystocksn.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/mystocksn.csv
Source : A partir do Yahoo Finance API, https://finance.yahoo.com/
data | IBOV | VALE3 | PETR4 | DOLAR | |
---|---|---|---|---|---|
0 | 2020-02-27 | 102984.0 | 9.92 | 12.08 | 4.4491 |
1 | 2020-02-28 | 104172.0 | 9.82 | 12.10 | 4.4848 |
2 | 2020-03-02 | 106625.0 | 10.27 | 12.49 | 4.4413 |
3 | 2020-03-03 | 105537.0 | 10.22 | 12.16 | 4.4724 |
4 | 2020-03-04 | 107224.0 | 10.56 | 12.33 | 4.5132 |
dataset: Life_Expectancy_Data.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/Life_Expectancy_Data.csv
Source : WHO World Health Organization, https://www.who.int/
Country | Year | Status | Life expectancy | Adult Mortality | infant deaths | Alcohol | percentage expenditure | Hepatitis B | Measles | ... | Polio | Total expenditure | Diphtheria | HIV/AIDS | GDP | Population | thinness 1-19 years | thinness 5-9 years | Income composition of resources | Schooling | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2015 | Developing | 65.0 | 263.0 | 62 | 0.01 | 71.279624 | 65.0 | 1154 | ... | 6.0 | 8.16 | 65.0 | 0.1 | 584.259210 | 33736494.0 | 17.2 | 17.3 | 0.479 | 10.1 |
1 | Afghanistan | 2014 | Developing | 59.9 | 271.0 | 64 | 0.01 | 73.523582 | 62.0 | 492 | ... | 58.0 | 8.18 | 62.0 | 0.1 | 612.696514 | 327582.0 | 17.5 | 17.5 | 0.476 | 10.0 |
2 | Afghanistan | 2013 | Developing | 59.9 | 268.0 | 66 | 0.01 | 73.219243 | 64.0 | 430 | ... | 62.0 | 8.13 | 64.0 | 0.1 | 631.744976 | 31731688.0 | 17.7 | 17.7 | 0.470 | 9.9 |
3 | Afghanistan | 2012 | Developing | 59.5 | 272.0 | 69 | 0.01 | 78.184215 | 67.0 | 2787 | ... | 67.0 | 8.52 | 67.0 | 0.1 | 669.959000 | 3696958.0 | 17.9 | 18.0 | 0.463 | 9.8 |
4 | Afghanistan | 2011 | Developing | 59.2 | 275.0 | 71 | 0.01 | 7.097109 | 68.0 | 3013 | ... | 68.0 | 7.87 | 68.0 | 0.1 | 63.537231 | 2978599.0 | 18.2 | 18.2 | 0.454 | 9.5 |
5 rows × 22 columns
dataset: gapminder_2015.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/gapminder_2015.csv
Source : Gapminder, https://www.gapminder.org/
continent | country | year | demox_eiu | income_per_person | invest_%_gdp | tax_%_gdp | gini_index | LifeExpect | HappyIdx | SchoolYears15_24 | VacineBelieve | ChildMortality | Co2Emissions | CPI | Population | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Africa | Botswana | 2015 | 78.7 | 15700 | 32.1 | 24.7 | 60.5 | 66.9 | 0.376 | 8.40 | NaN | 40.7 | 2.560 | 63.0 | 2120000 |
1 | Africa | Burkina Faso | 2015 | 47.0 | 1600 | 24.3 | 15.1 | 35.5 | 60.7 | 0.442 | 3.76 | NaN | 86.8 | 0.182 | 38.0 | 18100000 |
2 | Africa | Cote d'Ivoire | 2015 | 33.1 | 3230 | 20.1 | 15.4 | 41.6 | 61.0 | 0.445 | 6.59 | NaN | 90.0 | 0.405 | 32.0 | 23200000 |
3 | Africa | Egypt | 2015 | 31.8 | 10200 | 14.3 | 12.5 | 31.2 | 70.2 | 0.476 | 10.60 | NaN | 23.6 | 2.370 | 36.0 | 92400000 |
4 | Africa | Kenya | 2015 | 53.3 | 2800 | 21.5 | 16.3 | 41.5 | 64.7 | 0.436 | 9.06 | NaN | 46.3 | 0.341 | 25.0 | 47900000 |
dataset: tips.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/tips.csv
Source : Seaborn package, https://github.com/mwaskom/seaborn-data/blob/master/tips.csv
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
dataset: sp500_ibov.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/sp500_ibov.csv
Source : A partir do Yahoo Finance API, https://finance.yahoo.com/
Date | SP500 | IBOV | |
---|---|---|---|
0 | 2020-01-02 | 3257.850098 | 118573.0 |
1 | 2020-01-03 | 3234.850098 | 117707.0 |
2 | 2020-01-06 | 3246.280029 | 116878.0 |
3 | 2020-01-07 | 3237.179932 | 116662.0 |
4 | 2020-01-08 | 3253.050049 | 116247.0 |
dataset: T1_reshape.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/T1_reshape.csv
Source : Kaggle, adaptado de https://www.kaggle.com/berkerisen/wind-turbine-scada-dataset
WindSpeed(m/s) | ActivePower(kW) | Theoretical_Power_Curve (KWh) | Loss_Value(kW) | Loss(%) | count | direction | |
---|---|---|---|---|---|---|---|
0 | 3.5 | 43.46 | 70.58 | 27.12 | 38.02 | 29 | N |
1 | 4.0 | 88.01 | 127.57 | 39.56 | 31.60 | 101 | N |
2 | 4.5 | 160.51 | 217.01 | 56.50 | 26.42 | 102 | N |
3 | 5.0 | 274.71 | 335.67 | 60.96 | 18.44 | 99 | N |
4 | 5.5 | 388.60 | 465.84 | 77.24 | 16.79 | 119 | N |
dataset: T1_reshape_mean.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/T1_reshape_mean.csv
Source : Kaggle, adaptado de https://www.kaggle.com/berkerisen/wind-turbine-scada-dataset
WindSpeed(m/s) | ActivePower(kW) | Theoretical_Power_Curve (KWh) | Loss_Value(kW) | Loss(%) | count | |
---|---|---|---|---|---|---|
0 | 3.5 | 52.191667 | 68.939167 | 16.745833 | 23.584167 | 56.250000 |
1 | 4.0 | 97.756667 | 127.686667 | 29.929167 | 23.646667 | 144.666667 |
2 | 4.5 | 183.601667 | 221.930000 | 38.327500 | 17.488333 | 149.500000 |
3 | 5.0 | 286.590833 | 336.426667 | 49.835000 | 14.937500 | 143.750000 |
4 | 5.5 | 400.392500 | 470.090833 | 69.699167 | 14.838333 | 167.666667 |
dataset: corona_Brasil.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/corona_Brasil.csv
Source : Corona Vírus Brasil, https://covid.saude.gov.br
regiao | estado | municipio | coduf | codmun | codRegiaoSaude | nomeRegiaoSaude | data | semanaEpi | populacaoTCU2019 | casosAcumulado | casosNovos | obitosAcumulado | obitosNovos | Recuperadosnovos | emAcompanhamentoNovos | interior/metropolitana | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Brasil | NaN | NaN | 76 | NaN | NaN | NaN | 2020-02-25 | 9.0 | 210147125.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN |
1 | Brasil | NaN | NaN | 76 | NaN | NaN | NaN | 2020-02-26 | 9.0 | 210147125.0 | 1.0 | 1.0 | 0.0 | 0.0 | NaN | NaN | NaN |
2 | Brasil | NaN | NaN | 76 | NaN | NaN | NaN | 2020-02-27 | 9.0 | 210147125.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN |
3 | Brasil | NaN | NaN | 76 | NaN | NaN | NaN | 2020-02-28 | 9.0 | 210147125.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN |
4 | Brasil | NaN | NaN | 76 | NaN | NaN | NaN | 2020-02-29 | 9.0 | 210147125.0 | 2.0 | 1.0 | 0.0 | 0.0 | NaN | NaN | NaN |
dataset: flights_delays_2015.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/flights_delays_2015.csv
Source : Kaggle, adaptado de https://www.kaggle.com/usdot/flight-delays
arr_delay | name | |
---|---|---|
0 | 11.0 | United Air Lines Inc. |
1 | 20.0 | United Air Lines Inc. |
2 | 12.0 | United Air Lines Inc. |
3 | 7.0 | United Air Lines Inc. |
4 | -14.0 | United Air Lines Inc. |
dataset: glassdoordata.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/glassdoordata.csv
Source : Glassdoor, a partir de https://www.glassdoor.com.br/
jobtitle | gender | age | performance | education | department | seniority | income | bonus | |
---|---|---|---|---|---|---|---|---|---|
0 | Graphic Designer | Female | 18 | 5 | College | Operations | 2 | 42363 | 9938 |
1 | Software Engineer | Male | 21 | 5 | College | Management | 5 | 108476 | 11128 |
2 | Warehouse Associate | Female | 19 | 4 | PhD | Administration | 5 | 90208 | 9268 |
3 | Software Engineer | Male | 20 | 5 | Masters | Sales | 4 | 108080 | 10154 |
4 | Graphic Designer | Male | 26 | 5 | Masters | Engineering | 5 | 99464 | 9319 |
dataset: sp500_ibov_2000.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/sp500_ibov_2000.csv
Source : A partir do Yahoo Finance API, https://finance.yahoo.com/
Date | SP500 | IBOV | YYYY-MM | |
---|---|---|---|---|
0 | 2000-01-03 | 1455.219971 | 16930.0 | 2000-01 |
1 | 2000-01-04 | 1399.420044 | 15851.0 | 2000-01 |
2 | 2000-01-05 | 1402.109985 | 16245.0 | 2000-01 |
3 | 2000-01-06 | 1403.449951 | 16107.0 | 2000-01 |
4 | 2000-01-07 | 1441.469971 | 16309.0 | 2000-01 |
dataset: energy_types.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/energy_types.csv
Source : R for Data Science Online Learning Community, https://github.com/rfordatascience
country | country_name | type | level | 2016 | 2017 | 2018 | |
---|---|---|---|---|---|---|---|
0 | BE | Belgium | Conventional thermal | Level 1 | 30728.0 | 31316.0 | 30092.635 |
1 | BE | Belgium | Nuclear | Level 1 | 41430.0 | 40128.5 | 26995.628 |
2 | BE | Belgium | Hydro | Level 1 | 1476.0 | 1360.9 | 1239.248 |
3 | BE | Belgium | Pumped hydro power | Level 2 | 1110.0 | 1093.2 | 983.190 |
4 | BE | Belgium | Wind | Level 1 | 5340.0 | 6387.9 | 7177.346 |
dataset: nightingale.csv
URL : https://raw.githubusercontent.com/Rogerio-mack/Visualizacao-de-Dados-em-Python/main/data/nightingale.csv
Source : Datasets for Visualization Construction, https://visdatasets.github.io/index.html
Date | Month | Year | Army | Disease | Wounds | Other | Disease.rate | Wounds.rate | Other.rate | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1854-04-01 | Apr | 1854 | 8571 | 1 | 0 | 5 | 1.4 | 0.0 | 7.0 |
1 | 1854-05-01 | May | 1854 | 23333 | 12 | 0 | 9 | 6.2 | 0.0 | 4.6 |
2 | 1854-06-01 | Jun | 1854 | 28333 | 11 | 0 | 6 | 4.7 | 0.0 | 2.5 |
3 | 1854-07-01 | Jul | 1854 | 28722 | 359 | 0 | 23 | 150.0 | 0.0 | 9.6 |
4 | 1854-08-01 | Aug | 1854 | 30246 | 828 | 1 | 30 | 328.5 | 0.4 | 11.9 |