database - Retrieving information from aggregated weblogs data, how to do it? -
I would like to know how to recover data from the collected logs? This is what I have:
- Uncompressed log data of approximately 30 GB daily daily loaded in HDFS (and it will grow to about 100 GB)
This is my opinion:
- every night it Data is processed with pig
Logs are read, split, and retrieves custom UDF data: timestamp
, url
, User_id
(can say, it is all log-in entry and it loads in Hbiz ( Ogg data will be stored infinitely)
Then if I want to know which user has seen a particular page within a particular time, then I can immediately ask for a complete query with every query Hbiz without scanning the log data (and I want a quick answer - minutes are acceptable) and many more queries are happening at the same time.
What do you think of this workflow? , That this information is HB Base would mean the load? What other options and they'd do I compare my solution? I appreciate all comments / questions and answers. Thank you in advance.
With Hadop you are always doing one of two things (either resources or queries).
I will suggest using HIVE for what you are looking for. You can take your data and then you can prepare an M / R job to process that process from how you like it in the hive table (you can also segment it by data because it is your May be appropriate for motion to see the required data). From here you can ask the results of your data as your choice. Here's a great online tutorial.
There are several ways to solve this, but it seems that HB is a bit taller, unless you want to make it necessary to set up all the servers needed to know it Are there. Hubbies would be nice if you have thousands of people looking to get information together.
You would also like to see in flu which is the new import server from Claudia. This will get your files from some places directly to HDFS
Comments
Post a Comment