![]() ![]() You need the server name from SQL Database for this operation. To exit Beeline, enter !quit at the prompt. This query retrieves a list of cities that experienced weather delays, along with the average delay time, and saves it to Later, Sqoop reads the data from this location and exports it to Azure SQL Database. SELECT regexp_replace(OriginCityName, '''', ''), ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' When you receive the jdbc:hive2://localhost:10001/> prompt, use the following query to retrieve data from the imported flight delay data: INSERT OVERWRITE DIRECTORY '/tutorials/flightdelays/output' To start Hive and run the flightdelays.hql file, use the following command: beeline -u 'jdbc:hive2://localhost:10001/ transportMode=http' -f flightdelays.hqlĪfter the flightdelays.hql script finishes running, use the following command to open an interactive Beeline session: beeline -u 'jdbc:hive2://localhost:10001/ transportMode=http' Save the file by typing CTRL+X and then typing Y when prompted. Substring(DEST_STATE_ABR, 2, length(DEST_STATE_ABR) -1) AS DestState, Substring(DEST_CITY_NAME,2) AS DestCityName, Substring(DEST, 2, length(DEST) -1) AS DestAirportSeqID, Substring(ORIGIN_STATE_ABR, 2, length(ORIGIN_STATE_ABR) -1) AS OriginState, Substring(ORIGIN_CITY_NAME, 2) AS OriginCityName, Substring(ORIGIN, 2, length(ORIGIN) -1) AS OriginAirportSeqID, Substring(FL_NUM, 2, length(FL_NUM) -1) AS Flight_Number_Reporting_Airline, Substring(CARRIER, 2, length(CARRIER) -1) AS Reporting_Airline, Substring(UNIQUE_CARRIER, 2, length(UNIQUE_CARRIER) -1) AS IATA_CODE_Reporting_Airline, pulled in from the CSV file (via the external table defined previously) Create the delays table and populate it with data LOCATION Drop the delays table if it exists ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' The following lines describe the format and location of the file Creates an external table over the csv file Then copy and paste the text into the nano console by using pressing the SHIFT key along with the right-mouse select button. Modify the following text by replacing the and placeholders with your container and storage account name. csv file into an Apache Hive table named delays.įrom the SSH prompt that you already have for the HDInsight cluster, use the following command to create and edit a new file named flightdelays.hql: nano flightdelays.hql In this section, you use Beeline to run an Apache Hive job.Īs part of the Apache Hive job, you import the data from the. csv file to the directory: hdfs dfs -put "On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2016_1.csv" quotes around the file name if the file name contains spaces or special characters. hdfs dfs -mkdir -p the following command to copy the. Use the following command to create a directory. Replace the placeholder with the name of your storage account. hadoop fs -D "fs.azure.createRemoteFileSystemDuringInitialization=true" -ls the placeholder with the name that you want to give your container. Use the following command to create the Data Lake Storage Gen2 container. ![]() On the command prompt, enter the following command: ssh the following command to unzip the. zip the upload has finished, connect to the cluster by using SSH. If you use a public key, you might need to use the -i parameter and specify the path to the matching private key. If you use a password to authenticate your SSH username, you're prompted for the password.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |