浙大数据结构Week5

课程地址:https://www.icourse163.org/course/ZJU-93001
笔记中图片均来自此mooc中讲义。

这周是树的最后一部分内容,主要讲了堆以及哈夫曼树,下面回顾下。

Part1:回顾

1.堆(heap)

首先看一下优先队列的含义。

如果我们使用完全二叉树存储优先队列,那么就称为堆。简单来说,堆就是任意父节点小于等于或大于等于子节点的完全二叉树。


再来看下堆的抽象数据类型描述,这里只列了最大堆,最小堆同理。这部分的代码实现都会在作业中体现,故这里从略。

最后来看下最大堆的建立这个问题。

这里讲一下第二种方法,处于倒数第2层的元素最多交换1次,倒数第3层可能要交换1次至倒数第2层,之后可能再交换一次,所以1共最多要交换2次,同理倒数第k层的元素最多交换k-1次,下面计算倒数第k层的元素个数。
假设一共有$n$个元素,完全二叉树共$k$层,那么$2^{k-1}\le n< 2^k-1$,最后一层最多有$2^{k-1}$个元素,所以最后一层的元素个数小于$\frac{n}{2}$,倒数第二层的元素个数小于$\frac{n}{4}$,交换次数最多为$1$,同理倒数第$i$层的元素个数小于$\frac{n}{2^i}$,交换次数最多为$i-1$,所以对此求和即可,可以看出比依次插入效率高很多。

2.哈夫曼树(Huffman Tree)


那么如何构造哈夫曼树呢,其实很简单,只要每次把两个权值最小的两颗二叉树合并即可,理由很简单,要使WPL最小,则应该将权值最小的节点放在树的底部,来看一个例题了解这点。

再来了解下哈夫曼树的特点。

第一点是因为根据哈夫曼树构造的特点,每个节点要么是根节点,要么是两个节点的父节点,所以没有度为1的节点。
第二点是根据之前的公式$n_{0}=n_{2}+1,n_i$表示度为$i$的节点,因为哈夫曼树的$n_1=0$,所以当叶子节点$n_0=n$时,节点的总数为$n_{0}+n_{2}=2n_{0}-1=2n-1$
第三点很好理解,每次合并的时候没有左右之分。
第四点看图示即可。
这里再补充一点,哈夫曼树的WPL除了利用定义计算,还可以累加每次合并的两个节点之和来得出,这是因为如果节点a的长度为k,则其要被合并k次,因此会被累加k次,这个结论再本次作业3中会利用。

最后来看一个哈夫曼编码的问题。

不等长编码的最主要问题是二义性的问题,例如010可以表示为0,10两个编码,也可表示为010这一个编码,这里想到可以使用二叉树的叶节点表示,具体如下。

如果将字符出现的频率看做权重,字符的长度看为路径,那么上述问题可以转化为构造哈夫曼树的问题。

3.集合的表示

最后来看一个很常见的问题,集合的表示。

再来看数组表示法,用一个二维数组表示,数组的下标表示数据的位置,数组的第一个值表示数组的元素打下,第二个元素表示父节点的位置,如果该元素为根节点,则第二个元素为-k,k表示为集合元素的个数。

Part2:作业

  • 作业1

05-树7 堆中的路径(25 分)
将一系列给定数字插入一个初始为空的小顶堆H[]。随后对任意给定的下标i,打印从H[i]到根结点的路径。
输入格式:
每组测试第1行包含2个正整数N和M(≤1000),分别是插入元素的个数、以及需要打印的路径条数。下一行给出区间[-10000, 10000]内的N个要被插入一个初始为空的小顶堆的整数。最后一行给出M个下标。
输出格式:
对输入中给出的每个下标i,在一行中输出从H[i]到根结点的路径上的数据。数字间以1个空格分隔,行末不得有多余空格。
输入样例:
5 3
46 23 26 24 10
5 4 3
输出样例:
24 23 10
46 23 10
26 10

这题还是比较基础的,这里用python自己定义了一个堆class,然后按题目输入之后将序列逐个insert产生堆,然后打印即可。注意这里产生heap的方法是insert,没有采用老师说的方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# -*- coding: utf-8 -*-
"""
Created on Mon Apr 09 12:22:45 2018

@author: 715073608
"""

#定义堆
class Heap():
def __init__(self,x,y):
self.heap=y
self.len=len(y)
#type定义堆类型,x=1表示递增,x=0表示递减
self.type=x

def insert(self,x):
self.heap.append(x)
self.len+=1
i=self.len
if self.type==1:
while (i>=2):
#print self.heap
#不满足递增堆的定义时交换父子
if self.heap[i-1]<self.heap[i/2-1]:
self.heap[i-1],self.heap[i/2-1]=self.heap[i/2-1],self.heap[i-1]
i/=2
else:
break
#print self.heap,'1'
elif self.type==0:
while (i>=2):
#print self.heap
#不满足递减堆的定义时交换父子
if self.heap[i-1]>self.heap[i/2-1]:
self.heap[i-1],self.heap[i/2-1]=self.heap[i/2-1],self.heap[i-1]
i/=2
else:
break

def heaprd(x,i):
a=''
a+=str(x.heap[i-1])
i/=2
while i>0:
a+=' '+str(x.heap[i-1])
i/=2
return a

heap=Heap(1,[])
a=map(int,raw_input().strip().split(' '))
b=map(int,raw_input().strip().split(' '))
c=map(int,raw_input().strip().split(' '))
for i in b:
heap.insert(i)
for i in c:
print heaprd(heap,i)

  • 作业2

05-树8 File Transfer(25 分)
We have a network of computers and a list of bi-directional connections. Each of these connections allows a file transfer from one computer to another. Is it possible to send a file from any computer on the network to any other?

Input Specification:

Each input file contains one test case. For each test case, the first line contains N ($2≤N≤10^4$), the total number of computers in a network. Each computer in the network is then represented by a positive integer between 1 and N. Then in the following lines, the input is given in the format:
I c1 c2
where I stands for inputting a connection between c1 and c2; or
C c1 c2
where C stands for checking if it is possible to transfer files between c1 and c2; or S
where S stands for stopping this case.

Output Specification:

For each C case, print in one line the word “yes” or “no” if it is possible or impossible to transfer files between c1 and c2, respectively. At the end of each case, print in one line “The network is connected.” if there is a path between any pair of computers; or “There are k components.” where k is the number of connected components in this network.
Sample Input 1:
5
C 3 2
I 3 2
C 1 5
I 4 5
I 2 4
C 3 5
S
Sample Output 1:
no
no
yes
There are 2 components.
Sample Input 2:
5
C 3 2
I 3 2
C 1 5
I 4 5
I 2 4
C 3 5
I 1 3
C 1 5
S
Sample Output 2:
no
no
yes
yes
The network is connected.

这里采用之前所述的二维数组来表示集合。
先读取a表示一共有几个元素,利用字典c来存储集合,将元素对应的值初始化为-1,即每个元素表示一个集合。再定义find函数,在字典s中找元素x所在集合,如果s[x]小于0,则x为根节点,直接返回,否则递归调用find(s,s[x])。再定义union函数,首先找到两个元素对应的根节点,将根节点对应值较大(元素个数较小)的根节点指向另一个根节点,此根节点的值等于原来两个根节点值之和,之所以这样做是为了防止树的高度越来换大。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

# -*- coding: utf-8 -*-
"""
Created on Mon Apr 09 13:13:46 2018

@author: 715073608
"""
a=int(raw_input())
b=[]
while True:
temp=raw_input().strip().split(' ')
#print temp
if temp[0]=='S':
break
else:
b.append(temp)

c={}

for i in range(1,a+1):
c[i]=-1

def find(s,x):
if s[x]<0:
return x
else:
s[x]=find(s,s[x])
return s[x]

def union(s,x,y):
root1=find(s,x)
root2=find(s,y)
#print root1,root2
if s[root1]<s[root2]:
s[root1]+=s[root2]
s[root2]=root1
else:
s[root2]+=s[root1]
s[root1]=root2

for i in b:
if i[0]=='I':
union(c,int(i[1]),int(i[2]))
else:
root1=find(c,int(i[1]))
root2=find(c,int(i[2]))
if root1==root2:
print 'yes'
else:
print 'no'
ans=0
for i in c:
if c[i]<0:
ans+=1
if ans==1:
print 'The network is connected.'
else:
print 'There are '+str(ans)+' components.'
  • 作业3

05-树9 Huffman Codes(30 分)
In 1953, David A. Huffman published his paper “A Method for the Construction of Minimum-Redundancy Codes”, and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string “aaaxuaxz”, we can observe that the frequencies of the characters ‘a’, ‘x’, ‘u’ and ‘z’ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a’=0, ‘x’=10, ‘u’=110, ‘z’=111}, or in another way as {‘a’=1, ‘x’=01, ‘u’=001, ‘z’=000}, both compress the string into 14 bits. Another set of code can be given as {‘a’=0, ‘x’=11, ‘u’=100, ‘z’=101}, but {‘a’=0, ‘x’=01, ‘u’=011, ‘z’=001} is NOT correct since “aaaxuaxz” and “aazuaxax” can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:
c[1] f[1] c[2] f[2] … c[N] f[N]
where c[i] is a character chosen from {‘0’ - ‘9’, ‘a’ - ‘z’, ‘A’ - ‘Z’, ‘_’}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:
c[i] code[i]
where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 ‘0’s and ‘1’s.

Output Specification:

For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.
Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.
Sample Input:
7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11
Sample Output:
Yes
Yes
No
No

这题的代码非常繁琐,但其实就两个步骤:首先构造一个哈夫曼树,计算最小的WPL,其次判断是否有二义性,这里判断二义性是看代码a的值是否和代码b的前len(a)项相同。
这里的代码还有改进的余地,因为最大元素点测试未通过,网上找了个c语言版本直接通过了,还是c快呀。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# -*- coding: utf-8 -*-
"""
Created on Mon Apr 09 21:41:46 2018

@author: Administrator
"""

'''首先构造哈夫曼树,计算WPL,如果输入的WPL不等于哈夫曼树的WPL,则输出no,否则,判断是否为前缀码
如果不是前缀码,则输出YES
'''


n=int(raw_input())
a=raw_input().strip().split(' ')
b={}
c=[]
for i in range(n):
s=int(a[2*i+1])
b[a[2*i]]=s
c.append(s)

num=int(raw_input())

ans={}
score1={}
#读入
for i in range(num):
temp={}
flag=0
score=0
for j in range(n):
s=raw_input().strip().split(' ')
temp[s[0]]=s[-1]
if len(s[-1])==63:
flag=1
score+=len(s[-1])*b[s[0]]
#temp['flag']=flag
ans[i]=temp
score1[i]=(score,flag)


'''
n=7
b={'A': 1, 'B': 1, 'C': 1, 'D': 3, 'E': 3, 'F': 6, 'G': 6}
c=[1,1,1,3,3,6,6]
d={'A': 0, 'B': 0, 'C': 0, 'D': 0, 'E': 0, 'F': 0, 'G': 0}
num=4
ans={0: {'A': '00000', 'B': '00001', 'C': '0001', 'D': '001', 'E': '01', 'F': '10', 'G': '11',},
1: {'A': '01010', 'B': '01011', 'C': '0100', 'D': '011', 'E': '10', 'F': '11', 'G': '00'},
2: {'A': '000', 'B': '001', 'C': '010', 'D': '011', 'E': '100', 'F': '101', 'G': '110'},
3: {'A': '00000', 'B': '00001', 'C': '0001', 'D': '001', 'E': '00', 'F': '10', 'G': '11'}}
'''
'''
n=3
b={'A':1,'B':2,'C':3}
c=[1,2,3]
d={'A':0,'B':0,'C':0}
num=2
ans={0:{'A':'10','B':'11','C':'0'},
1:{'A':'10','B':'11','C':'1'}}
'''

#判断是否有歧义
def qz(x):
for i in x:
for j in x:
if i!=j:
if isqz(x[i],x[j])==0:
return 0
return 1

#定义最小堆
class Heap(object):
#初始化,a为哨兵
def __init__(self,x,a,M):
#哨兵填入首元素
s=[a]
s+=x
self.data=s
self.len=len(x)
self.size=M

#将以第self.data[p]为根的子堆调整为最小堆
def PerDown(self,p):
#print self.data[p:],1
x=self.data[p]
parent=p
while(parent<=self.len/2):
child=2*parent
#print child,parent
#将child指向子节点中较小的值
if (child<self.len and (self.data[child]>self.data[child+1])):
child+=1
if(x<=self.data[child]):
break
else:
self.data[parent]=self.data[child]
parent=child
self.data[parent]=x
#print self.data[p:],2

def BuildHeap(self):
i=self.len
while(i>0):
#print self.data,i
self.PerDown(i)
#print self.data
i-=1

#插入元素
def insert(self,x):
if self.len==self.size:
print('已满')
#return False
else:
self.data.append(x)
self.len+=1
i=self.len
while(self.data[i/2]>x):
self.data[i]=self.data[i/2]
i/=2
self.data[i]=x
#return True

#抛出最小元素
def extract(self):
if self.len==0:
print("堆为空")
else:
start=self.data[1]
#从最后一个元素开始,找到小于
x=self.data[-1]
self.len-=1
parent=1
while(parent*2<=self.len):
child=2*parent
#将child指向子节点中较小的值
if (child<self.len and (self.data[child]>self.data[child+1])):
child+=1
if(x<=self.data[child]):
break
else:
self.data[parent]=self.data[child]
parent=child
self.data[parent]=x
del self.data[-1]
return start

#利用之前的结论计算WPL
def Huffman(x):
s=0
while x.len>1:
l=x.extract()
r=x.extract()
s+=l+r
x.insert(l+r)
return s
heap=Heap(c,0,10000)
heap.BuildHeap()
#print heap.data
#score=Huffman(heap,b.copy())
score=Huffman(heap)

def huffsum(x,y):
s=0
for i in x:
s+=len(x[i])*y[i]
return s

for i in range(num):
if score1[i][0]==score and qz(ans[i]):
print 'Yes'
else:
print 'No'