dayi的大键盘
dayi的大键盘

大数据-笔记-1

大数据-笔记-1

文章地址:(可能更好的阅读体验)

  1. https://ovo.dayi.ink/2023/11/09/big%E6%95%B0%E6%8D%AE/
  2. https://www.cnblogs.com/rabbit-dayi/p/17822822.html
  3. https://type.dayiyi.top/index.php/archives/244/
  4. https://blog.dayi.ink/?p=93
  5. https://cmd.dayi.ink/-BShZBz_ThuNVoGepjYbsA (本文原始链接,打开速度可能偏慢)

1. 安装

https://cmd.dayi.ink/uploads/upload_caf6dd6c0307ae84bfb5cc0b769fa7a1.png

2. 查看三台虚拟机IP

https://cmd.dayi.ink/uploads/upload_190e2470793e2d862b996018b0287444.png

设置静态IP

https://cmd.dayi.ink/uploads/upload_5571076fe51e7e022f115a7e0957cdfc.png

3. 修改IPV4为manual

https://cmd.dayi.ink/uploads/upload_2eccb85ea5d9af6a16970f8a9714b7bd.png

IP 地址如下:

gataway : 192.168.83.2
mask : 24
master: 192.168.83.10
slv1: 192.168.83.11
slv2: 192.168.83.12
https://cmd.dayi.ink/uploads/upload_3917bfcea0afbb332d3a8718f7b548fd.png

3. 关闭防火墙

su
[输入密码:123456]
service iptable stop
service iptable status
https://cmd.dayi.ink/uploads/upload_03e4a1f396bee69f029b92b7e43a92fc.png

===

https://cmd.dayi.ink/uploads/upload_750ed32826d55b6b06ee9235846fc560.png

slave1 slave2:

https://cmd.dayi.ink/uploads/upload_a0aaa1eb487c4f3a3ead9eb418bca2e1.png
https://cmd.dayi.ink/uploads/upload_09fff5f30ec447fa07e81482c44269bb.png

4.关闭防火墙自动运行

chkconfig iptables off
chkconfig --list | grep iptables

master:

https://cmd.dayi.ink/uploads/upload_58e2dfe56b0ff1c5b42154842110e5f8.png

slave1:

https://cmd.dayi.ink/uploads/upload_c8f5d2d61da4f4c5bd55af2bccd725c2.png

slave2:

https://cmd.dayi.ink/uploads/upload_16e68cba00eed8e10f0c7d1015a89bf9.png

5. 配置主机名

#hostname 
vi /etc/sysconfig/network
cat /etc/sysconfig/network
hostname dayi-bigdata-master
hostname dayi-bigdata-salve-1
hostname dayi-bigdata-salve-2

输入 i 进入编辑模式 输入 :wq 保存文件

发现已经做了)


master:

https://cmd.dayi.ink/uploads/upload_218b4fe5c1fd0c451a74ef26164ada07.png

slave1:

![]

https://cmd.dayi.ink/uploads/upload_2214342f82563f6dc052dc26441d03fc.png

slave2:

https://cmd.dayi.ink/uploads/upload_83e275e04c1ab7e060e706627d60a0ad.png

6. IP 地址 HOSTNAME 绑定

vi /etc/hosts
192.168.83.XX master
192.168.83.xx slave1
192.168.83.xx slave2
i : 进入编辑模式
wq : 保存文件
https://cmd.dayi.ink/uploads/upload_7bbf40c86e395aa6c82c3022d608a845.png

master:

https://cmd.dayi.ink/uploads/upload_31b0b7fa51e48efab9b8dec4e359a9b3.png

slave1:

https://cmd.dayi.ink/uploads/upload_6f2fd30950be33fea933e1f855c3f323.png

slave2

https://cmd.dayi.ink/uploads/upload_5d23bcad1c9e5dfca6e0f54a607b5a2c.png

chk1:

https://cmd.dayi.ink/uploads/upload_9579e3efbfe1b1eac5223b739e4e46f5.png

7. 修改windows的host

打开文件

路径打开这个:

C:\Windows\System32\drivers\etc\

C:\Windows\System32\drivers\etc\hosts

先复制到桌面,然后修改完再复制回去(真就同步,我先想出来的pvp写的PVP)

https://cmd.dayi.ink/uploads/upload_98c1a3b3e7c092b4953107cd4a5d1c1e.png
https://cmd.dayi.ink/uploads/upload_4aae95f7a4f23f644c7f7e042d8a41f8.png

8. 配置远程SSH登录

哦,直接登录

https://cmd.dayi.ink/uploads/upload_511f27b369ee0f9de4073ebe2ee9c310.png

但是不是限制密钥登录..登不进去呀 哦哦,可以直接登录,那没事了

https://cmd.dayi.ink/uploads/upload_b2dbf3b0528c1322b3633abe1252665f.png

添加三个主机之后

https://cmd.dayi.ink/uploads/upload_7ef9068a3692ec9063fe9529ce29b537.png

ok)

https://cmd.dayi.ink/uploads/upload_9b5b6ab4809a87399e124926abc2718b.png

锟斤拷烫烫烫

问题解决:

chk:

  1. windows 的hosts文件C:\Windows\System32\drivers\etc\
  2. 虚拟机的IP地址是否对应

三个节点窗口:

查看 -> 交互窗口 -> 右键底部的窗口,发送交互到所有标签

因为虚拟机里套虚拟机不好截图,菜单截不下来(

https://cmd.dayi.ink/uploads/upload_bec3e42e7f5d61fa7052e429d72379f6.png

这个分割线没有用

我猜的

输入

ssh-keygen

生成RSA密钥:

https://cmd.dayi.ink/uploads/upload_020f1dad29c39aaec27583b6e6ac7bb3.png

复制公钥:

https://cmd.dayi.ink/uploads/upload_53d4a0dc2e1194025834b8598ccc5ac9.png

输入到虚拟机里的

.ssh/authorized_keys


9. 时钟同步

9.1 手动时间同步

su root
[输入密码:123456]
date #显示系统时间
hwclock --show #显示硬件时间
#如果时间不一样的话这样省事,直接NTP同步时间:
#设置硬件时钟调整为与本地时钟一致
timedatectl set-local-rtc 1
#设置时区为上海
timedatectl set-timezone Asia/Shanghai
ntpdate -u  pool.ntp.org
date
## 手动同步
date -s 20230907
date -s 9:40
hwclock --systohc
https://cmd.dayi.ink/uploads/upload_bcc340ecd4afbbe6eb0a9e13e728800d.png

显示系统时间

https://cmd.dayi.ink/uploads/upload_44e9def235bc0343c7799be744514711.png

显示系统硬件时间:

https://cmd.dayi.ink/uploads/upload_fe270fce80d588d7557b1c0eb6536f9a.png

(全是锟斤拷)

NTP同步时间

https://cmd.dayi.ink/uploads/upload_f517b7f9319bef872dbe2a9ec87f1e47.png

9.2 自动时间同步

下班

2023年9月13日14:09:57 时间同步

timedatectl set-local-rtc 1
#设置时区为上海
timedatectl set-timezone Asia/Shanghai
ntpdate -u  ntp.aliyun.com
hwclock --systohc
https://cmd.dayi.ink/uploads/upload_01ab9266104f4c5d4754eef02ec59892.png

9.3 配置自动时间同步

两个从节点

crontab -e 
# 输入i进行输入模式
# ctrl+shift+v 粘贴
0 1 * * * /usr/sbin/ntpdate ntp.aliyun.com
# 输入:wq 保存

10. 免密登录

#直接输入exit
# 或者
su dayi
https://cmd.dayi.ink/uploads/upload_0bddd3d48226349c19575763ef622d43.png

10.1 设置SSH免密登录

ssh-keygen -t rsa
#然后回车3次就行了
https://cmd.dayi.ink/uploads/upload_196948acbbce90dc8bca9e0210bf1d46.png
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
cat ~/.ssh/authorized_keys
https://cmd.dayi.ink/uploads/upload_a15e3cbd36743a0ab54512a7c49de1ea.png

我觉得这个公钥有点用:(应该没啥用了) ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA6yR468UQBZ/9KSG71FD0UVlv9N0I6q2RfA94yLT7uGhja9vzBJSP9mDg8RF8+Z6p5+katfYE7YLzjtLNMtC5lTjkwW8WHFyCGUP0QEcAIH0ZdDVn3nwHG9k+b2XfpLNKOieWYqUoixRSzIecUd5iq3WDe4dUjgBmGhfouo+FtQob/q8OOg2iJszl86ad8dE9W2BRS3VU5q6/OZmPp8uJcfXAl4/bHJq56+FNPSwk9b+umAsiH+bqeVCkW6JJd/Tw7DGkhYACGxleF5onBtiLKwbMZ+RanWiFm9AZqod86rcmZ9IPYaWf/QXgyun5vrNBBgBT+a8CBsRoBpFk0X7CCw== dayi@dayi-bigdata-master


ssh-copy-id dayi-bigdata-slave-1
ssh-copy-id dayi-bigdata-slave-2
https://cmd.dayi.ink/uploads/upload_b1f7f8025a39effc7225e962606045b4.png

10.2 测试是否免密登录

ssh dayi-bigdata-slave-1
# 退出 exit 或者 ctrl+d
ssh dayi-bigdata-slave-2
# 退出 exit 或者 ctrl+d
https://cmd.dayi.ink/uploads/upload_79725834652a87e23fcbbfd157be8e23.png

11. 安装JDK(普通用户)

新建文件夹

cd ~
mkdir -p ~/resources
cd ~/resources

复制文件 sftp://dayi-bigdata-master sftp://dayi-bigdata-slave-1 sftp://dayi-bigdata-slave-2

https://cmd.dayi.ink/uploads/upload_c6b5a90c785f36f63efe7a3d65a8c59f.png
https://cmd.dayi.ink/uploads/upload_894881ddda93907d0d1120c2721d0414.png

查看下当前的文件

ls 

[dayi@HOSTNAME=dayi-bigdata-slave-2 resources]$ ls
jdk-7u71-linux-x64.gz
[dayi@HOSTNAME=dayi-bigdata-slave-2 resources]$ 

解压

tar -zxvf jdk-7u71-linux-x64.gz 
# 重命名
mv jdk1.7.0_71 jdk

11.1 配置环境变量

# (可选,出错请忽略)按老师目录移动文件
mv ~/Home/resources ~/resources
vim ~/.bash_profile
#按 i 输入
export JAVA_HOME=~/resources/jdk
export PATH=.:$JAVA_HOME/bin/:$PATH
#input: :wq (保存)
source .bash_profile

11.2 验证JDK安装成功

java -version
https://cmd.dayi.ink/uploads/upload_fed7ae4c5abfe916246eb74464621e26.png

12. 安装hadoop(master 普通用户)

只做主节点。

tar -zxvf hadoop-2.6.4.tar.gz 
mv hadoop-2.6.4 hadoop

编辑bash配置 文件


# 记得在普通用户下
vim ~/.bash_profile

# i 输入

# 编辑模式下ctrl+shift+v
export HADOOP_HOME=~/resources/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

#:wq保存文件

np++连接

  • 打开notepad++
https://cmd.dayi.ink/uploads/upload_02c0fda58df39aa774d00b5fccdf7ca9.png

输入配置文件

https://cmd.dayi.ink/uploads/upload_eb5a336c83e455fd052e8eaacd2f7442.png

点这个

https://cmd.dayi.ink/uploads/upload_daab8bdc8d225639e2000c55854ec6d8.png

然后到这里来:

/home/dayi/resoureces/hadoop/etc/hadoop

https://cmd.dayi.ink/uploads/upload_06cc1292c6591a47949efd12d330f056.png

修改hadoop-env.sh

hadoop-env.sh 25 行

export JAVA_HOME=~/resources/jdk
https://cmd.dayi.ink/uploads/upload_21a778f568c7b3668d5b2ddac9440cd0.png

修改 yarn-env.sh

yarn-env.sh

  • 23 行
export JAVA_HOME=~/resources/jdk
https://cmd.dayi.ink/uploads/upload_befdd8e34f09d8b74740c5d1267a7997.png

修改 core-site.xml

core-site.xml


  
    fs.defaultFS     
    hdfs://master:9000   
  
  
    hadoop.tmp.dir     
    /home/wg/resources/hadoopdata   
  
https://cmd.dayi.ink/uploads/upload_1c11f4bbf48385e78f47922963c833f3.png

新建data文件夹

# 记得自己改
mkdir -p /home/dayi/resources/hadoopdata

修改hdfs-site.xml


   dfs.replication
   2
https://cmd.dayi.ink/uploads/upload_678da0687a0a80114a0a37be84b2614e.png

ctrl+s

修改yarn-site.xml



  
    yarn.nodemanager.aux-services
    mapreduce_shuffle
  
  
   yarn.resourcemanager.address
   master:18040
  
  
    yarn.resourcemanager.scheduler.address
    master:18030
  
  
    yarn.resourcemanager.resource-tracker.address
    master:18025
  
  
    yarn.resourcemanager.admin.address
    master:18141
  
  
    yarn.resourcemanager.webapp.address
    master:18088
   
https://cmd.dayi.ink/uploads/upload_f76c59f4879be9a09434d17570e54eca.png

修改mapred-site.xml

  • 打开mapred-site.xml.template

  • 新建文件:

https://cmd.dayi.ink/uploads/upload_c0450268e4abfc267a8882917d7e73dc.png
  • 输入内容


 
    mapreduce.framework.name
    yarn
  
https://cmd.dayi.ink/uploads/upload_1da6037a9a99abd283dec89724d7cd06.png

打开slaves

添加

slave1
slave2
https://cmd.dayi.ink/uploads/upload_06a1304ce67504f795588ca32786720f.png

将配置好点hadoop发送到从节点

cd ~
cd resources
scp -r 

scp -r hadoop dayi-bigdata-slave-1:~/resources/hadoop/
scp -r hadoop dayi-bigdata-slave-2:~/resources/hadoop/
https://cmd.dayi.ink/uploads/upload_c9abe9d733b586e0bd12ecbb30c7f60b.png

13.启动hadoop

  1. 格式化文件系统(master,普通用户
hadoop namenode -format
https://cmd.dayi.ink/uploads/upload_e0c39596dd02c946608a49d7ca76bf9e.png
https://cmd.dayi.ink/uploads/upload_32abd532745d1e5a7bd4bf65418d5a2e.png
  1. 启动hadoop
cd ~/resources/hadoop/sbin/

./start-all.sh

~/resources/hadoop/sbin/./start-all.sh
https://cmd.dayi.ink/uploads/upload_a8ce97dae667390df7b53ab7c1103011.png
  1. check
jps
https://cmd.dayi.ink/uploads/upload_2f4857b9d884ee877e5e5eb58c3e4579.png
https://cmd.dayi.ink/uploads/upload_ff961aca6b85968ffd7d1d4a4785aaeb.png
https://cmd.dayi.ink/uploads/upload_3ea27d0ff76b9d43e8f840ed6893c38f.png

http://master:50070

https://cmd.dayi.ink/uploads/upload_8304eeeee2d3c1b5cdf8b40e40dea7b9.png
https://cmd.dayi.ink/uploads/upload_c06e83eeaf8091222204f0c15775345a.png

13.1 验证

#启动
~/resources/hadoop/sbin/./start-all.sh

http://master:50070

13.2 修复

~/resources/hadoop/sbin/./stop-all.sh

rm -rf ~/resources/hadoopdata
rm -rf ~/resources/hadoop/logs

hadoop namenode -format

~/resources/hadoop/sbin/./start-all.sh

结果:

https://cmd.dayi.ink/uploads/upload_30830911ef03e4216d25b69e693d2808.png

14 HDFS

先保证你的节点有两个以上哦

https://cmd.dayi.ink/uploads/upload_8a1089dc668b1beecf08e075baab780c.png
  1. 浏览文件
hadoop fs -ls /

好像没什么东西

https://cmd.dayi.ink/uploads/upload_1b74ff752529062a3a63279b0a459f9e.png

这里好像也可以看:

http://master:50070/explorer.html#/

https://cmd.dayi.ink/uploads/upload_cd5839ac57f46b05e4a25953d026663b.png
  1. 创建目录
# 创建目录
hadoop fs -mkdir /ovo
hadoop fs -mkdir /a

# 查看目录/
hadoop fs -ls /
https://cmd.dayi.ink/uploads/upload_7a546dfbd1a73edbc2d4772ab02cc697.png

主页查看文件

https://cmd.dayi.ink/uploads/upload_acae6eab1496cf20dac1f63c100ff487.png
  1. 上传文件?

1) 本地新建一个文件

也可以直接图形界面去干

```bash
cd ~
mkdir b
ls -alh b
cd b
echo ovo>>ovo.txt

# i 输入  esc->:wq 保存
vi test
```

![](https://cmd.dayi.ink/uploads/upload_c1cafce09bba6934b540c701cfa2c204.png)

查看目录和文件:

```bash
ls -alh
cat test
```

![](https://cmd.dayi.ink/uploads/upload_1d3f01c459a875aa64618932d49293d8.png)

2) 将test上传到文件系统

cd ~/b
hadoop fs -put test /a
hadoop fs -ls /a
#或者主页打开
https://cmd.dayi.ink/uploads/upload_e00ead97e522725cdd55bdfe43d26a3b.png
https://cmd.dayi.ink/uploads/upload_296c8d52b0846fa3183800dfe957ff5e.png
https://cmd.dayi.ink/uploads/upload_b57636c5fd46e95e45e261f2f589db51.png
  1. 查看文件内容
hadoop fs -cat /a/test
hadoop fs -text /a/text
https://cmd.dayi.ink/uploads/upload_891d0708cf1ac3fcfa3a59cad4c7c6a3.png
  1. 下载文件
hadoop fs  -get /a/test test1
ls -alh
cat test1
https://cmd.dayi.ink/uploads/upload_38b05c27c3379f65766720d9c6f02733.png
  1. 修改文件权限
hadoop fs -ls /a
hadoop fs -ls /

hadoop fs -chmod +? 
https://cmd.dayi.ink/uploads/upload_af6442618867245f419b04407b48fa82.png
https://cmd.dayi.ink/uploads/upload_8e645dfefdac8b6633e7d800308bafc7.png
-rw-r--r--

- 文件/目录

rwx : read write exce(执行)

rw- : 用户权限 (u) 
r-- : 组权限 (g)
r-- : 其他人权限 (o)

增加执行权限:

hadoop fs -chmod u+x /a/test
hadoop fs -ls  /a
https://cmd.dayi.ink/uploads/upload_74770aa619419bcd3a5406d2017c0089.png

组权限+wx other权限+wx

hadoop fs -chmod o+x /a/test
hadoop fs -chmod g+x /a/test
hadoop fs -ls  /a
https://cmd.dayi.ink/uploads/upload_4c2d2d9e5aecff2865a6cf75d09e3f2f.png

去除所有人的执行权限

hadoop fs -chmod a-x /a/test
hadoop fs -ls  /a
https://cmd.dayi.ink/uploads/upload_e1ccea7f2ec40532546ccf0d78dd6371.png

数字:

000
rwx

自由组合,二进制转十进制

比如全部权限

111 = 2^0+2^1+2^2=1+2+4=7
rwx

权限

hadoop fs -chmod 644 /a/test

#很危险的操作哦)容易被黑掉
hadoop fs -chmod 777 /a/test

hadoop fs -ls  /a

hadoop fs -chmod 644 /a/test
https://cmd.dayi.ink/uploads/upload_ccce68823419a8155d113c2af5c1d226.png
  1. 删除文件
hadoop fs -rm /a/test
https://cmd.dayi.ink/uploads/upload_0c4e84ab591226eba2b5eb0411e30b17.png
  1. 删除目录
hadoop fs -rm -r /a
hadoop fs -ls /
https://cmd.dayi.ink/uploads/upload_a44800415cca16244e42022a9c7e2945.png

15. mysql安装?

  1. 新建mydb文件

普通用户

mkdir ~/resources/mydb
  1. 复制文件,把文件粘贴进去

能直接粘贴到虚拟机就不用这个

这里软件是filezila

https://cmd.dayi.ink/uploads/upload_03322ee41ebb371e6414432452fce230.png
https://cmd.dayi.ink/uploads/upload_9acef96558484b3068a77e8cde97305f.png

如何连接:

https://cmd.dayi.ink/uploads/upload_ec9e8a54539297ede8890918baef7228.png
  1. 检查系统里是否有自带的mysql
su

rpm -qa | grep -i mysql
https://cmd.dayi.ink/uploads/upload_9a73b8b50b7d6f8fe39e23b892e61145.png

有一个诶

  1. 有的话就删除

建议打个快照先

rpm -e mysql-libs-5.1.71-1.el6.x86_64 --nodeps

#然后看看还有没有

rpm -qa | grep -i mysql
https://cmd.dayi.ink/uploads/upload_3d69551f3d74b0b9eaf58bdee60d0719.png
  1. 安装4个文件

common , libs , client, server

#(在普通用户下)
su dayi(按你用户名)
cd ~
su 
cd resources/mydb
ls -al
https://cmd.dayi.ink/uploads/upload_1f394edc12be713937f73a33ae30aa41.png

安装:

rpm -ivh mysql-community-common-5.7.13-1.el6.x86_64.rpm
rpm -ivh mysql-community-libs-5.7.13-1.el6.x86_64.rpm
rpm -ivh mysql-community-client-5.7.13-1.el6.x86_64.rpm
rpm -ivh mysql-community-server-5.7.13-1.el6.x86_64.rpm
https://cmd.dayi.ink/uploads/upload_cea4f02ae586cfe05341b19f03f52681.png
  1. 启动server
service mysqld start 
https://cmd.dayi.ink/uploads/upload_6e2b302777dae58bf1ca204c02dd53c9.png

修改mysql默认密码


service mysqld start
sudo cat /var/log/mysqld.log | grep 'temporary password' 
https://cmd.dayi.ink/uploads/upload_ae8c2eac2628a4ba5b90aeca770301ce.png
https://cmd.dayi.ink/uploads/upload_7f122fb63fdcfdd50cae8270c55d4730.png
mysql -uroot -p"di
https://cmd.dayi.ink/uploads/upload_dc67226a8b20ad33d67db4d523ca962c.png

修改密码: 用户名:root 密码:wg1%Bigdata

ALTER USER 'root'@'localhost' identified by 'wg1%Bigdata'; 

OK 就可以

https://cmd.dayi.ink/uploads/upload_82b41ef07045ff02c80f38f950137e32.png

mysql中创建用户

用户名: hadoop 密码: hadoop1%Bigdata

Grant all on *.* to 'hadoop'@'%' identified by 'hadoop1%Bigdata';
grant all on *.* to hadoop@'localhost' identified by 'hadoop1%Bigdata';
grant all on *.* to hadoop@'master' identified by 'hadoop1%Bigdata';
flush privileges;
https://cmd.dayi.ink/uploads/upload_af10a61ba5ee64f6e26e28f01f163746.png

尝试登录

quit;
su dayi
mysql -uhadoop -p"hadoop1%Bigdata"
https://cmd.dayi.ink/uploads/upload_90d03010756b402ffe49a3e68a1abcea.png

MYSQL数据库里创建hive元数据库,叫hive1

查看当前数据库

show databases;
https://cmd.dayi.ink/uploads/upload_bbb3c1ed7d46573f0c1aa2dc1a035557.png

创建数据库

create database hive1;
show databases;
quit;
https://cmd.dayi.ink/uploads/upload_9ab7709c6f66bc4708a1532adaab4d02.png

16. 安装hive

1.复制hive文件到resources

https://cmd.dayi.ink/uploads/upload_cb07151af4a8965cadd6dc782926e1b6.png

2.解压缩

#进入目录
cd ~/resources
# 解压缩
tar -zxvf apache-hive-1.2.1-bin.tar.gz
# 重命名(移动)
mv apache-hive-1.2.1-bin hive
https://cmd.dayi.ink/uploads/upload_eb99370865218fc85e00dc7dedc4cbca.png

3. NPP改配置文件

连接:

https://cmd.dayi.ink/uploads/upload_9aad28a640ee7ffdc86aa98ef9717c87.png

打开目录:~/resources/hive/conf

https://cmd.dayi.ink/uploads/upload_ea36a051d95148c2272deaf469e09cdd.png

新建文件,右键文件夹hive-site.xml

https://cmd.dayi.ink/uploads/upload_ba1bd851b4ea0ad907b2d2b8b64e8e41.png

内容如下:



 
   
    hive.metastore.local 
    true 
   
  
    javax.jdo.option.ConnectionURL
    jdbc:mysql://master:3306/hive1?characterEncoding=UTF-8
    
    
      javax.jdo.option.ConnectionDriverName
      com.mysql.jdbc.Driver 
    
   
      javax.jdo.option.ConnectionUserName 
      hadoop 
    
    
      javax.jdo.option.ConnectionPassword 
      hadoop1%Bigdata 
    
https://cmd.dayi.ink/uploads/upload_113f79518fadf7ec5d5ab5c58447aa09.png

如果你的主机名不一样记得改这行: jdbc:mysql://master:3306/hive1?characterEncoding=UTF-8

jdbc:mysql://master:3306/hive1?characterEncoding=UTF-8
driver      host    port database_name  

hadoop 用户名
hadoop1%Bigdata 密码

4. 复制驱动

mysql-connector-java-5.1.42-bin.jar -> ~/resources/hive/lib

https://cmd.dayi.ink/uploads/upload_0b22faa729870a3ad5947d7d0883d146.png

5.配置bash

配置环境变量

  • npp
    https://cmd.dayi.ink/uploads/upload_9979b4ac33808d7357daff8c61fb1a29.png
vim ~/.bash_profile

#添加这两行
export HIVE_HOME=~/resources/hive
export PATH=$PATH:$HIVE_HOME/bin

# 重载配置文件
source ~/.bash_profile
https://cmd.dayi.ink/uploads/upload_c63b2c1911f60d97f5074d3f2ae49572.png

然后查看hive版本,如果能查看就说明环境目录成功了

hive --version
https://cmd.dayi.ink/uploads/upload_070a2ef97b8da877ec5d7abb010c8468.png

搞得和预言家一样

6. 替换文件

rm ~/resources/hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar
cp ~/resources/hive/lib/jline-2.12.jar ~/resources/hadoop/share/hadoop/yarn/lib/
https://cmd.dayi.ink/uploads/upload_3c11b9993b4a75105360ff9963ccf0fe.png

7. 启动hive

#1.下载原神(hadoop)
~/resources/hadoop/sbin/./start-all.sh

#2. 安装原神(mysqld)
service mysqld start

#3.启动原神
hive
https://cmd.dayi.ink/uploads/upload_95fbaa69810103d73923a3d43e8c6106.png
https://cmd.dayi.ink/uploads/upload_319349f84cb51190db305b5c924c8e1b.png

原神:

https://cmd.dayi.ink/uploads/upload_702d1f6be6e7ae3edd8784a3730d0df3.png
https://cmd.dayi.ink/uploads/upload_fe0151df5cc23d06cd97626a7d5ef2ee.png
https://cmd.dayi.ink/uploads/upload_fec93a63f59e4f6d3e696f0f24030abb.png

8.启动

~/resources/hadoop/sbin/./start-all.sh
service mysqld start
hive
https://cmd.dayi.ink/uploads/upload_458386ae07c2c5dd962afafe7c096207.png
https://cmd.dayi.ink/uploads/upload_f2e2b2f5513d2870dbecd4c72bc4b05b.png

CRT 调整:

https://cmd.dayi.ink/uploads/upload_693835ec8ffaffebbe63dfa4a139c01a.png
https://cmd.dayi.ink/uploads/upload_5850e9d7b011141f359d185d8a81f3c7.png
https://cmd.dayi.ink/uploads/upload_6953db8c3405adb28301a848cc2ec4da.png

17.hive应用

hive>quit;

1. 新建文件

cd ~
mkdir cc
cd cc
vi stu

i 输入 ESC+:wq保存

https://cmd.dayi.ink/uploads/upload_3efb8ec3e51914db69837e95cee63958.png

查看文件

cat stu
https://cmd.dayi.ink/uploads/upload_99ba9f99470f2987a1221427f86fa003.png

2.启动hive

hive

命令:

  • 查看数据库

    show databases;
    https://cmd.dayi.ink/uploads/upload_601fb939defd9abcce5962d729b7bb75.png
  • 创建数据库

    create database studb;
    show databases;
    https://cmd.dayi.ink/uploads/upload_61ffc5903870ebf180782638513aa73c.png
  • 切换数据库

    use studb;
    https://cmd.dayi.ink/uploads/upload_87c0442907961c25db750612235bfde2.png
  • 查看当前数据库里的表

    show tables;
    https://cmd.dayi.ink/uploads/upload_8563ea6274bf5de8bdaa102df150c0f3.png
  • 创建表 \t来区分列

    create table if not exists stu
    (name string,age int,sex string)
    row format delimited fields terminated by '\t';
    show tables;
    https://cmd.dayi.ink/uploads/upload_aca24e73427a7ca1b430a0b12b89b911.png

    查看文件: http://master:50070/explorer.html#/user/hive/warehouse/studb.db/

    https://cmd.dayi.ink/uploads/upload_5eac76829e01953b7b7bebaba33c0e9c.png
  • 查看表的创建信息等

    show create table stu;
    https://cmd.dayi.ink/uploads/upload_b97aebf30a18486349955c26ed1aaa8f.png
  • 查看表结构

    desc stu;
    https://cmd.dayi.ink/uploads/upload_b3273cefff482dbbb8b7d97c230887fe.png

3. 加载数据

在hive下

  • 从文件加载数据

    load data local inpath '/home/dayi/cc/stu' into table stu;
    https://cmd.dayi.ink/uploads/upload_b54cf29cdefa8d4c47a3edd07a2cd674.png
  • 查询表

    select * from stu;
    https://cmd.dayi.ink/uploads/upload_2b0da944876242ed9129a8d0fd9c4189.png
  • 删除表

    drop table stu;
    https://cmd.dayi.ink/uploads/upload_a227d31cc09eedca59582aa38730bb65.png
  • 删库跑路

    database没有表,为空

    drop database studb;
    https://cmd.dayi.ink/uploads/upload_f7263682272e9b7cc13e4f982e84b0c0.png

    有表的情况下

    drop database studb cascade;

4. 大数据分析-hive应用2

搜狗

https://cmd.dayi.ink/uploads/upload_fac0347b585f5400734638d293b3b554.png
ts    搜索时间
uid    用户id
kw    用户搜索的关键字
rank    搜索结果排序
orders    点击次数
url    访问的路径

创建

create database sogoudb;
show databases;
https://cmd.dayi.ink/uploads/upload_2304485c5b31261c1a0b8d4909709d1a.png

使用数据库

use sogoudb;

创建外部表

create external table if not exists sogou
(ts string,uid string,kw string,rank int,orders int,url string)
row format delimited fields terminated by '\t'
location '/s1/sogoudata';
https://cmd.dayi.ink/uploads/upload_9d7b602869a28c19144b2ec9860bd77b.png

查看表

show tables;
desc sogou;
https://cmd.dayi.ink/uploads/upload_9b32fdd63bbabaa7c6effc89bd5881ce.png

加载数据方式2

把这个500M文件(sogou.500w.utf8)塞到cc目录(~/cc)

https://cmd.dayi.ink/uploads/upload_6a200b62909b5bdd5c7c5dda59444de9.png
https://cmd.dayi.ink/uploads/upload_f7d08db3b7f8b68602d15e27f494b0a4.png

查看文件

cd ~/cc
ls -alh
https://cmd.dayi.ink/uploads/upload_bea0bd9963c1e3dbb804b7e5796f0316.png

上传文件

#上传文件(好慢)
hadoop fs -put sogou.500w.utf8 /s1/sogoudata
#查看文件
hadoop fs -ls /s1/sogoudata
https://cmd.dayi.ink/uploads/upload_49b89f41009ffc74ba132d989ea48ce2.png

http://dayi-bigdata-master:50070/explorer.html#/s1/sogoudata

https://cmd.dayi.ink/uploads/upload_b87ff4a4ffd60a5b8d7dd95e8163d2c6.png

进入hive-导入

在hive里

use sogoudb;

查询记录-前三条

select * from sogou limit 3;
https://cmd.dayi.ink/uploads/upload_337d1b3166375aadc00149effaff4a6f.png
https://cmd.dayi.ink/uploads/upload_a9650467d3132799325aded169fe34be.png

查询表里总共多少记录

select count(*) from sogou;
https://cmd.dayi.ink/uploads/upload_1db572d5bd35bc8d5ecde8cfafea3cff.png
https://cmd.dayi.ink/uploads/upload_1d520d8cf18568fce87881fa82a11533.png

超级慢

https://cmd.dayi.ink/uploads/upload_805bfa3c1633fa21d2c1c301aecc5014.png
https://cmd.dayi.ink/uploads/upload_e46611c3cc21d62c17b14f7a12d5b6c9.png
https://cmd.dayi.ink/uploads/upload_29d60cca596335b2b19f24b37767fa2b.png

193秒:

https://cmd.dayi.ink/uploads/upload_447370b177f7c1133147c7d67fcdab9a.png

查询UID总数

select count(distinct(uid)) from sogou;

还没执行完

https://cmd.dayi.ink/uploads/upload_f7517abd82aeccab04c2540f036e7824.png

177秒

https://cmd.dayi.ink/uploads/upload_45d2c75515f441b0b316ef4feddff634.png
select count(uid) from sogou 
group by uid;
https://cmd.dayi.ink/uploads/upload_a9277135bf51c183d4e1a32c4c353b10.png

出来了:

https://cmd.dayi.ink/uploads/upload_b4c5498ca98636c626e9ad1ff4d6a1d6.png

统计用户平均查询次数

$平均次数=总次数/用户个数$

建立临时表:

a

uid cnt
a 10
b 20

总次数 = sum(a.cnt) 用户个数 = count(a.uid)

临时表a

select uid,count(*) as cnt
from sogou
group by uid;

总表:

select sum(a.cnt)/count(a.uid) from
(select uid,count(*) as cnt
from sogou
group by uid)as a;

开炮:

https://cmd.dayi.ink/uploads/upload_f6ebd867cd7ec4d028a9ca15542736eb.png
https://cmd.dayi.ink/uploads/upload_02f32fd9f7efc787c9b03d2d7fdbaf1f.png

好慢:

https://cmd.dayi.ink/uploads/upload_3867e441e8349c887a5bdd23ae86f8ed.png

出了

https://cmd.dayi.ink/uploads/upload_61fe62b072b88cae2c8947b852b065fa.png

3.6964094557111005

查询频度最高的前10个关键字

SELECT kw, COUNT(kw) AS cnt
FROM sogou
GROUP BY kw
ORDER BY cnt DESC
LIMIT 10;
https://cmd.dayi.ink/uploads/upload_8d114237c3444956541187c1f72549a7.png

那个URL

https://cmd.dayi.ink/uploads/upload_c823147de0a6219c953ba96476bf8c8f.png
https://cmd.dayi.ink/uploads/upload_6a9492f1041d8672e4a4dec544e54e28.png

好慢hhhh:

https://cmd.dayi.ink/uploads/upload_2577c6a727418a052dd2d4d4576eed2f.png

出来了:

https://cmd.dayi.ink/uploads/upload_79e511a2b1c8895134a106c9c9a2e75d.png

18. zookeeper

1. 复制压缩包

cd resources
tar -zxvf zookeeper-3.4.6.tar.gz
mv zookeeper-3.4.6 zookeeper
https://cmd.dayi.ink/uploads/upload_eb38343e8fcfed5bce745e842a4cd6c6.png

recources/zookeeper/conf

文件zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

server.1 服务器编号

master 服务器名

2888 服务端口

3888 选举端口

https://cmd.dayi.ink/uploads/upload_2a53ee7b9ad4700820044ca57bb74465.png

3.创

数据节点表示文件myid

位置 dataDir=/tmp/zookeeper

#位置在内存盘里 重启会丢失
mkdir -pv /tmp/zookeeper
cd /tmp/zookeeper
vi myid
输入1
然后ESC + :wq

# 那样建议这样写

#master
mkdir -pv /tmp/zookeeper && cd /tmp/zookeeper && echo 1 > myid
cat /tmp/zookeeper/myid

#slave1
mkdir -pv /tmp/zookeeper && cd /tmp/zookeeper && echo 2 > myid
cat /tmp/zookeeper/myid
#slave2
mkdir -pv /tmp/zookeeper && cd /tmp/zookeeper && echo 3 >myid
cat /tmp/zookeeper/myid
https://cmd.dayi.ink/uploads/upload_7654d00faf9b506d03ed265371160dbd.png
https://cmd.dayi.ink/uploads/upload_8c91bca8cc3a6d4e28d85071207d2e6b.png

服务器炸了虚拟机没了

https://cmd.dayi.ink/uploads/upload_abd784632bbe6f7e2b5ae5d1b58ff454.png

4. 发送zookeeper发送到从节点


cd ~/resources/
ls

scp -r zookeeper slave1:~/resources/
scp -r ~/resources/zookeeper slave2:~/resources/

5. 传送bash_profile

scp -r ~/.bash_profile slave1:~/
scp -r ~/.bash_profile slave2:~/

# 从节点分别执行:
source ~/.bash_profile

6. 启动zookeeper

cd ~/resources/zookeeper/bin/
ls 
./zkServer.sh start

#一键
~/resources/zookeeper/bin/./zkServer.sh start
https://cmd.dayi.ink/uploads/upload_0d9b5be147153995452ac26c7c3896a2.png
https://cmd.dayi.ink/uploads/upload_75c392f9c739f5c3dd788a230bff92dd.png
https://cmd.dayi.ink/uploads/upload_a0d6cba93df64a8f46664d2a7790705b.png

跟上了

https://cmd.dayi.ink/uploads/upload_ea2182d3f1686827bfb66b03b4318388.png

验证启动


./zkServer.sh status

这样是正常的

https://cmd.dayi.ink/uploads/upload_f041e8d7886b06b284d98961118697ce.png

炸了

https://cmd.dayi.ink/uploads/upload_127dc51c9affdb5e11386c57a7528673.png

调试(zkServer.sh print-cmd)

"/home/dayi/resources/jdk/bin/java" -Dzookeeper.log.dir="." -Dzookeeper.root.logger="INFO,CONSOLE" -cp "/home/dayi/resources/zookeeper/bin/../build/classes:/home/dayi/resources/zookeeper/bin/../build/lib/*.jar:/home/dayi/resources/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/dayi/resources/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/home/dayi/resources/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/home/dayi/resources/zookeeper/bin/../lib/log4j-1.2.16.jar:/home/dayi/resources/zookeeper/bin/../lib/jline-0.9.94.jar:/home/dayi/resources/zookeeper/bin/../zookeeper-3.4.6.jar:/home/dayi/resources/zookeeper/bin/../src/java/lib/*.jar:/home/dayi/resources/zookeeper/bin/../conf:"  -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain "/home/dayi/resources/zookeeper/bin/../conf/zoo.cfg"

然后直接复制启动(你的跟我的不一样)

在这里看看报错是什么

https://cmd.dayi.ink/uploads/upload_07b0e3049502d9e9bcc8d6f9af84da8a.png

出现错误看看这个文件是否正常

/tmp/zookeeper/myid

19. 安装hbase

1. 复制文件解压缩

cd ~/resources/
tar -zxvf hbase-0.98.9-hadoop2-bin.tar.gz
mv hbase-0.98.9-hadoop2 hbase
https://cmd.dayi.ink/uploads/upload_25abda27c168b293eaccb17847045b90.png

2. 修改配置文件

文件1

~/resources/hbase/conf/hbase-env.sh

29 行:

https://cmd.dayi.ink/uploads/upload_02e95db00d91b4a632fb24bd5a137f25.png

获得JAVA_HOME:

echo $JAVA_HOME

124行:

https://cmd.dayi.ink/uploads/upload_41b57424e5c6d6681e0023b48c315e15.png

true : 使用habase自带的ZK(你启动失败了zookeeper)

false : 不使用hbase自带的ZK (上节课启动成功)

文件2

~/resources/hbase/conf/hbase-site.xml



hbase.cluster.distributed
true


hbase.rootdir
hdfs://master:9000/hbase


hbase.zookeeper.quorum
master:2181,slave1:2181,slave2:2181

https://cmd.dayi.ink/uploads/upload_c3fb1d1e64ab6f106348d8a774ddc3f9.png

文件3

regionservers

https://cmd.dayi.ink/uploads/upload_86ad0ff3591083f7ef0d7492ca5d7f20.png

3. 添加环境变量

export HBASE_HOME=~/resources/hbase
export PATH=$PATH:$HBASE_HOME/bin
export HADOOP_CLASSPATH=$HBASE_HOME/lib/*
https://cmd.dayi.ink/uploads/upload_7a4622613df9549ef2edd7ab4ea67d28.png

保存后,sources一下

两个从节点:

source ~/.bash_profile

4. 传送文件

scp -r ~/resources/hbase slave1:~/resources/
scp -r ~/resources/hbase slave2:~/resources/
scp -r ~/.bash_profile slave1:~/
scp -r ~/.bash_profile slave2:~/
https://cmd.dayi.ink/uploads/upload_94071e2d6d6c70a937c14c8ae954e3e2.png
https://cmd.dayi.ink/uploads/upload_ac66b20bba40773f01f14072c7f79e71.png

source一下

#slave1
source ~/.bash_profile
#slave2
source ~/.bash_profile

5. 原神启动

hadoop

~/resources/hadoop/sbin/./start-all.sh

验证: http://master:50070

这种情况重新执行一下就行

https://cmd.dayi.ink/uploads/upload_985c8eebe55e6852f757ac8b724ff0b1.png
https://cmd.dayi.ink/uploads/upload_b520f05afed806d515331a6e4d47f2fe.png

zookeeper

#master
mkdir -pv /tmp/zookeeper && cd /tmp/zookeeper && echo 1 > myid
~/resources/zookeeper/bin/./zkServer.sh start
#slave1
mkdir -pv /tmp/zookeeper && cd /tmp/zookeeper && echo 2 > myid
~/resources/zookeeper/bin/./zkServer.sh start
#slave2
mkdir -pv /tmp/zookeeper && cd /tmp/zookeeper && echo 3 >myid
~/resources/zookeeper/bin/./zkServer.sh start

验证:

~/resources/zookeeper/bin/./zkServer.sh status
https://cmd.dayi.ink/uploads/upload_2e76a208ac4301c2c69e8429f14741b7.png
lader or following

hbase

~/resources/hbase/bin/start-hbase.sh

验证: http://master:60010

https://cmd.dayi.ink/uploads/upload_75f8de1e94cf4499a368faec6c2f3066.png
https://cmd.dayi.ink/uploads/upload_a48903837ca9a61c8790faf0b03e1847.png

20. hbase使用

1. shell

hbase shell

https://cmd.dayi.ink/uploads/upload_57a6c3851b87eb115bb4dedafe1660b0.png

2. CRT

https://cmd.dayi.ink/uploads/upload_2d7d1db6ecd3cfb5f8b465a754f26a0b.png
https://cmd.dayi.ink/uploads/upload_93d77dc324ef32177c53a8f5314ae3e6.png
https://cmd.dayi.ink/uploads/upload_35672e8b365466766e7e33cc9685fed6.png

3. 命令1 list

https://cmd.dayi.ink/uploads/upload_6413150a14420f9374ff706086d6efbf.png

4. 命令2 新建表

create 'stu','info','score'

        表名    列族1  列族2
https://cmd.dayi.ink/uploads/upload_be03e339a758bcef793e6aa8dbae81ec.png

4. 命令3 写数据

尽量不要超过三列

put 'stu','1','info:name','John'
https://cmd.dayi.ink/uploads/upload_d8548bb8aca16c12cda2992a04034b5e.png
put 'stu','2','info:age',20
put 'stu','3','info:name','Mary'
put 'stu','3','info:age',19

put 'stu','3','score:math',90
https://cmd.dayi.ink/uploads/upload_17bc0833c513560ba3c9d0d85152a5ea.png

5. 读数据

两种方式

get 'stu','1'

get 'stu','3'
https://cmd.dayi.ink/uploads/upload_b62bf364714d17ee4c3682eceacb84eb.png

scan 读所有行

scan 'stu'
https://cmd.dayi.ink/uploads/upload_5e45c29269b62c1d71b023d8b69145f3.png

查看表结构

describe 'stu'
https://cmd.dayi.ink/uploads/upload_112267cca7bd2d93df1a23712832f5fe.png
Table stu is ENABLED                                                                                                                                                                                            
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                     
{NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                                                                                 
{NAME => 'score', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                                                                                
2 row(s) in 0.0680 seconds

6. 删除

删单元格

delete 'stu','3','score:Math'

https://cmd.dayi.ink/uploads/upload_3cf1d1ebf6071a0f399c4a0604598956.png

删除失败好像不会报错。

删行

deleteall 'stu','2'
scan 'stu'
https://cmd.dayi.ink/uploads/upload_2e11228af6a789d6100052690492d9ff.png

删表

  1. 先让表失效
  2. 删除表
disable 'stu'
drop 'stu'
 list
https://cmd.dayi.ink/uploads/upload_26d95187f2eb7cc34e244f86e6be5387.png

20 MapReduce

肚子疼死了 拉肚子拉的有点自闭,上课没跟上

1. 安装JDK

jdk-8u102-windows-x64.exe 安装目录不要带有中文,可以的话推荐选择 C:\Program Files\Java或者D:\Program Files\Java 有闸瓦环境了非1.8版本也建议再装一个。

https://cmd.dayi.ink/uploads/upload_232cd27dd0ebaa0fca7ac79f65835d68.png

2. 解压缩hadoop

  1. 想法子把这个文件hadoop-2.6.4.tar.gz解压了
https://cmd.dayi.ink/uploads/upload_5c46c0196a811730084af4f36e60b005.png
  1. 推荐放到目录:D:\hadoop\hadoop-2.6.4
https://cmd.dayi.ink/uploads/upload_ae8a5614f2b12ad0a8983140669a38fd.png
https://cmd.dayi.ink/uploads/upload_65ace725efc2180c5ad3af6852537093.png
  1. 然后修补下hadoop

需要替换文件直接替换即可。

D:\hadoop\hadoop-2.6.4\bin里面放下:

  • hadoop-common-2.2.0-bin-master.zip里面的bin目录(替换文件)。
  • hadoop.dll 放入 D:\hadoop\hadoop-2.6.4\bin (替换)
  • winutils.exe 放入 D:\hadoop\hadoop-2.6.4\bin (替换)
https://cmd.dayi.ink/uploads/upload_bff51d9c454196369db8abe11dfee5d9.png
https://cmd.dayi.ink/uploads/upload_feaf653d841e38c2f4cc10335ec4e974.png

3. 配置环境变量

这一步就是加4个(3个)环境变量,没那么复杂,但是会根据你电脑的实际情况有所不一样。所以篇幅比较大

如何修改环境变量? 此电脑/这台电脑/计算机(在文件管理器里,防止你桌面的是个快捷方式)->右键 属性-> 找到高级系统设置 —> 高级 -> 环境变量

https://cmd.dayi.ink/uploads/upload_e293937090c0344474f11f00f27f8463.png
最后的5

按你实际修改目录,用户变量和系统变量都可以。

好像如果JAVA_HOME目录有空格的话会有些问题,

  • JAVA_HOME : C:\Program Files\Java\jdk1.8.0_102

    https://cmd.dayi.ink/uploads/upload_70f156b1544602ac99c67b9aeecac680.png
  • classpath : .;C:\Program Files\Java\jdk1.8.0_102\lib\dt.jar;C:\Program Files\Java\jdk1.8.0_102\lib\tools.jar;C:\Program Files\Java\jdk1.8.0_102\bin

  • HADOOP_HOME: D:\hadoop\hadoop-2.6.4

  • Path(不要跟*天宇一样直接删了再加,在原来的基础上修改,你的系统不应该没有这个环境变量参数(Path),请先找找,这个环境变量是有顺序的,前面的会优先遍历) :

    • 这种的添加;C:\Program Files\Java\jdk1.8.0_102\bin\;D:\hadoop\hadoop-2.6.4\bin
      https://cmd.dayi.ink/uploads/upload_ab2c2161ee4184491f02ef3dc19fdeba.png
    • 这种的新建两个条目,移动到最上面去
    • C:\Program Files\Java\jdk1.8.0_102\bin\
    • D:\hadoop\hadoop-2.6.4\bin
    • https://cmd.dayi.ink/uploads/upload_d5aceda67b11663324c28cb3c1b991cd.png
https://cmd.dayi.ink/uploads/upload_fa552daae4a2542985cd9086b567fee4.png
  • 验证环境变量
java -version
echo %JAVA_HOME%
ECHO %HADOOP_HOME%
hadoop verison

D:\jdk\jdk1.8.0_102

https://cmd.dayi.ink/uploads/upload_0b7b311f92ea79d28422f9f24b554de6.png

遇到这个问题:

https://cmd.dayi.ink/uploads/upload_c19276cbecb8316c41aa761d8875d927.png

我也不知道影不影响,原因是:Program Files中存在空格,然后他不能正常用,两个方法:

  • 方法1:把JAVA_HOME环境变量中的Program Files改为:PROGRA~1,比如这里我改为了:C:\PROGRA~1\Java\jdk1.8.0_102

    https://cmd.dayi.ink/uploads/upload_468911975614b0dff6109043e95f391d.png
  • 方法2(推荐):把JDK复制一份到普通目录中(不带空格),然后修改JAVA_HOME到这里

    https://cmd.dayi.ink/uploads/upload_99fecbffb2f3c804e3491f02d7a7b808.png
    https://cmd.dayi.ink/uploads/upload_902711c1c49927c74a4be54d55661b83.png
  • 方法三:虽然我带着的空格,但是跑起来很正常,老师跟我说是这样(求告知)

他这个JAVA_HOME的路径带空格好像会奇奇怪怪的

https://cmd.dayi.ink/uploads/upload_2e6befe43185bc321f7f29d7b6080456.JPG
改成JAVA_HOME = "C:/xxxxx/jre1.8_102"(带引号会这样)
https://cmd.dayi.ink/uploads/upload_ffe720605e22314d9f62cb22ddd77bad.JPG
复制了jdk到没空格的目录 把hadoop的env
https://cmd.dayi.ink/uploads/upload_63c4ba597b959347ab2d923e1a3e6393.JPG
这样好像可以 感觉不应该呀,带空格就识别不出来感觉不至于有这种低级问题bia
https://cmd.dayi.ink/uploads/upload_2dd586dd5d4f34fd1e110e047527c609.JPG
https://cmd.dayi.ink/uploads/upload_4afcc3df4195d41a817608ec7b9fd5d0.JPG
  • 也可以尝试修改这个文件的25行D:\hadoop\hadoop-2.6.4\etc\hadoop\hadoop-env.cmd
    https://cmd.dayi.ink/uploads/upload_95d56a8e542c304bf58beff78ef8f6dc.png

总之: 执行:

java -version
echo %JAVA_HOME%
ECHO %HADOOP_HOME%
hadoop verison

第一行和最后一行的结果应该这样,但是你的如果有点多余的东西,应该也不是很影响?

https://cmd.dayi.ink/uploads/upload_0bf642a7c8167feb8106e6bc89d30774.png

4. 新建项目

1. 配置Eclipse

感谢王策哥哥

windows->preference->java->install JREs->add->standard VM

https://cmd.dayi.ink/uploads/upload_c2228e8e9f485f1f54f99f6de6de7c66.png
https://cmd.dayi.ink/uploads/upload_2d8c2e4cf1fd6d94af8b21b1bdcfbe67.png

2. 新建工程

感谢王策 file->new project->java project->next->name->finish

https://cmd.dayi.ink/uploads/upload_2780b4950e371be420427df03826762b.png
https://cmd.dayi.ink/uploads/upload_baefbbaade1f38ba6ab272b299ecad3a.png

3. 导包

  1. ALT+ENTER 打开项目设置
  2. JAVA BUILD PATH -> 添加外部库
  3. 导入hadoop环境jar包
D:\hadoop\hadoop-2.6.4\share\hadoop\common所有jar包
D:\hadoop\hadoop-2.6.4\share\hadoop\common\lib所有jar包

D:\hadoop\hadoop-2.6.4\share\hadoop\hdfs所有jar包
D:\hadoop\hadoop-2.6.4\share\hadoop\hdfs\lib所有jar包

D:\hadoop\hadoop-2.6.4\share\hadoop\mapreduce所有jar包
D:\hadoop\hadoop-2.6.4\share\hadoop\mapreduce\lib所有jar包

D:\hadoop\hadoop-2.6.4\share\hadoop\yarn所有jar包
D:\hadoop\hadoop-2.6.4\share\hadoop\yarn\lib所有jar包
https://cmd.dayi.ink/uploads/upload_a0e2067471f2294c3ad2657c624743f2.png
https://cmd.dayi.ink/uploads/upload_7e24bf34d1fd278bba1300f777ce7d66.png
https://cmd.dayi.ink/uploads/upload_51cc53e6a04281a75b5dd505cffca452.png

5. 新建包

新建org.apache.hadoop.io.nativeio

org.apache.hadoop.io.nativeio

https://cmd.dayi.ink/uploads/upload_490c2b71d9f07fb7e035a9e06b20094e.png
https://cmd.dayi.ink/uploads/upload_c3e0407a02ef2e05b3f958ebc6e3b965.png

然后把NativeIO.java弄到包里

可以直接拖过来

https://cmd.dayi.ink/uploads/upload_552b2bdbc61666a394e81952ba8dd840.png
https://cmd.dayi.ink/uploads/upload_5f6ea70516029ebde93de1ba8354f176.png

新建包 my

my

https://cmd.dayi.ink/uploads/upload_540914d11f205d05c5e3a9f592d21bec.png

同样的,把文件弄过去

如果需要修改的话这里:

https://cmd.dayi.ink/uploads/upload_7fcc2a4097bc7409078a3ad30948bfb0.png

到master下修改下目录权限

master

#启动hadoop
~/resources/hadoop/sbin/./start-all.sh

看看网页: http://master:50070/dfshealth.html#tab-datanode

https://cmd.dayi.ink/uploads/upload_51f930acbb7cff8008281444e10b8c9a.png

正常的话, 设置777权限

hadoop fs -chmod 777 /
https://cmd.dayi.ink/uploads/upload_ea90787f58f4a3cadd206f9af8a54e65.png

6. 尝试新建目录

运行Test.java

https://cmd.dayi.ink/uploads/upload_1fdace55a47df0116c03aa8bfd89ebed.png

这样就好啦

https://cmd.dayi.ink/uploads/upload_175c4e4c13d5b249f4d6c7fdbae6ade6.png
https://cmd.dayi.ink/uploads/upload_a9baa3e230097f8ee9dcd822005075d2.png

单词统计

https://cmd.dayi.ink/uploads/upload_cc0c19fe86d2febe9e9f8e055702edc7.png
  • 21 行 可能需要重新配置

存放数据源

master

mkdir -pv ~/aa
cd ~/aa
vim ~/aa/words
cat ~/aa/words

[dayi@dayi-bigdata-master aa]$ cat ~/aa/words
ovo ovo
Hello words
Pig man
ZOOKEEPER
HADOOP
Hello words
ovo ovo
Zhangsan like pigman
Do u wanna build a snowman?
Come on let’s go and play!
I never see you anymore
come out the door
https://cmd.dayi.ink/uploads/upload_87ace337f1fa1ab70eaafb4cae96a9fc.png

上传words

  • 目录是否正常
hadoop fs -ls /

Found 6 items
drwxr-xr-x   - dayi          supergroup          0 2023-10-26 17:09 /hbase
drwxr-xr-x   - dayi          supergroup          0 2023-10-11 14:43 /ovo
drwxr-xr-x   - dayi          supergroup          0 2023-10-18 23:40 /s1
drwx-wx-wx   - dayi          supergroup          0 2023-10-18 23:57 /tmp
drwxr-xr-x   - dayi          supergroup          0 2023-10-18 23:01 /user
drwxr-xr-x   - Administrator supergroup          0 2023-11-01 23:37 /ww
[dayi@dayi-bigdata-master aa]$ 

要有这个/ww目录
没有的话

hadoop fs -mkdir /ww
hadoop fs -chmod 777 /ww
  • 上传文件
hadoop fs -put ~/aa/words /ww
hadoop fs -cat /ww/words
https://cmd.dayi.ink/uploads/upload_09095e938ca0eab2fa83017f45e34b61.png

运行程序

https://cmd.dayi.ink/uploads/upload_eb61d8f3b69f0e4689071fa3d8ed7572.png
https://cmd.dayi.ink/uploads/upload_9a95a2f9dc01eb2573b48b866561cf1e.png

查看文件:

http://master:50070/explorer.html#/ww/result

https://cmd.dayi.ink/uploads/upload_ed2b72ffacf9d204fc36b22c13491d76.png
https://cmd.dayi.ink/uploads/upload_b77210e4a2b57fb504b7b86d6bca62db.png

21 mapperreduce 代码

读取文件

package my;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class ReadData {
 public static void main(String[] args) throws Exception{
  String hdfs_addr = "hdfs://dayi-bigdata-master:9000";
  String uriSrting=hdfs_addr +"/ww/words";
  Configuration config = new Configuration();
  FSDataInputStream inputStream=null;
  try{
   FileSystem fileSystem = FileSystem.get(URI.create(hdfs_addr),config);
   inputStream = fileSystem.open(new Path(uriSrting));
   IOUtils.copyBytes(inputStream, System.out, 4096,false);
  }catch(Exception e){
   e.printStackTrace();
  }finally{
   IOUtils.closeStream(inputStream);
  }
 }
}
https://cmd.dayi.ink/uploads/upload_5276736e3af2206b18872125e3557c93.png

新建目录

上传文件

package my;

import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class PutFile {
  public static void main(String[] args) {
    String hdfs_addr = "hdfs://dayi-bigdata-master:9000";
    try{
      FileSystem fileSystem = FileSystem.get(URI.create(hdfs_addr),new Configuration());
      Path src = new Path("C:\\Windows\\System32\\drivers\\etc\\hosts");
      Path desc=new Path(hdfs_addr+"/ww/ff");
    fileSystem.copyFromLocalFile(src, desc);
    }catch (Exception e){
      e.printStackTrace();
    }
  }
}

下载文件

hadoop fs -chmod 777 /ww/ff
package my;

import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class GetFile {
  public static void main(String[] args) throws IOException {
    String hdfs_addr = "hdfs://dayi-bigdata-master:9000";
    // FileSystem fileSystem = new FileSystem.get(URI.create(hdfs_addr),new Configuration());
    FileSystem fileSystem=FileSystem.get(URI.create(hdfs_addr), new Configuration());
    Path src = new Path(hdfs_addr+"/ww/ff");

    try{
      FileOutputStream outputStream=new FileOutputStream("D:\\c.txt");
      FSDataInputStream inputStream=fileSystem.open(src);
   IOUtils.copyBytes(inputStream, outputStream, 4096, false);
  } catch (Exception e) {
   // TODO: handle exception
   e.printStackTrace();
  }

  }
}
https://cmd.dayi.ink/uploads/upload_b26e24f0c5665fd6061599b40e4dc167.png

列表

下发源文件

https://cmd.dayi.ink/uploads/upload_d3ca2126a46ed4a91f55203c5567fff6.png

删除文件

下发文件

https://cmd.dayi.ink/uploads/upload_73142271575935dfc41b242568a711c1.png

结束

OVO

发表回复

textsms
account_circle
email

dayi的大键盘

大数据-笔记-1
# 大数据-笔记-1 文章地址:(可能更好的阅读体验) 0. 1. 2. 3. 4. (本文原始链接,打开速度可能偏慢) ## 1. 安装 ![](https://cmd.dayi.ink/uploads/upload_caf6d…
扫描二维码继续阅读
2023-11-09