这一周主要介绍数据可视化。

课程地址:

[https://www.edx.org/course/the-analytics-edge]

setwd("E:\\The Analytics Edge\\Unit 7 Visualization")

读取数据

WHO = read.csv("WHO.csv")

str(WHO)
'data.frame':    194 obs. of  13 variables:
 $ Country                      : Factor w/ 194 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Region                       : Factor w/ 6 levels "Africa","Americas",..: 3 4 1 4 1 2 2 4 6 4 ...
 $ Population                   : int  29825 3162 38482 78 20821 89 41087 2969 23050 8464 ...
 $ Under15                      : num  47.4 21.3 27.4 15.2 47.6 ...
 $ Over60                       : num  3.82 14.93 7.17 22.86 3.84 ...
 $ FertilityRate                : num  5.4 1.75 2.83 NA 6.1 2.12 2.2 1.74 1.89 1.44 ...
 $ LifeExpectancy               : int  60 74 73 82 51 75 76 71 82 81 ...
 $ ChildMortality               : num  98.5 16.7 20 3.2 163.5 ...
 $ CellularSubscribers          : num  54.3 96.4 99 75.5 48.4 ...
 $ LiteracyRate                 : num  NA NA NA NA 70.1 99 97.8 99.6 NA NA ...
 $ GNI                          : num  1140 8820 8310 NA 5230 ...
 $ PrimarySchoolEnrollmentMale  : num  NA NA 98.2 78.4 93.1 91.1 NA NA 96.9 NA ...
 $ PrimarySchoolEnrollmentFemale: num  NA NA 96.4 79.4 78.2 84.5 NA NA 97.5 NA ...

按之前方法画图

plot(WHO$GNI, WHO$FertilityRate)

png

可以看到,上图比较简陋,接下去使用ggplot库作图,我的理解是ggplot是利用面向对象的方法作图

library(ggplot2)
scatterplot = ggplot(WHO, aes(x = GNI, y = FertilityRate))

Add the geom_point geometry

scatterplot + geom_point()

png

Make a line graph

scatterplot + geom_line()

png

Switch back to our points

scatterplot + geom_point()

png

Redo the plot other symbol

scatterplot + geom_point(color = "blue", size = 3, shape = 17)

png

scatterplot + geom_point(color = "darkred", size = 3, shape = 8) 

png

Add a title to the plot

增加标题

scatterplot + geom_point(colour = "blue", size = 3, shape = 17) + ggtitle("Fertility Rate vs. Gross National Income")

png

Color the points by region

ggplot(WHO, aes(x = GNI, y = FertilityRate, color = Region)) + geom_point()

png

Is the fertility rate of a country was a good predictor of the percentage of the population under 15?

ggplot(WHO, aes(x = log(FertilityRate), y = Under15)) + geom_point() + stat_smooth(method = "lm")

png

Add this regression line to our plot:

ggplot(WHO, aes(x = log(FertilityRate), y = Under15)) + geom_point() + stat_smooth(method = "lm")

png