performance - Merging two datasets in Python efficiently -
What is the most effective way to merge two datasets using Python?
A small background - This code will record 100K + in the following format:
{user: aUser, transaction: UsersTransactionNumber}, ...
and using the following data
{transaction: aTransactionNumber, activation number: assoiciatedActivationNumber}, ...
{user: aUser, activation number: Associated activation number}, ...
NB are not the Python dictionary, clear the record format The closest thing to be portrayed in the form of Yes.
So in theory, I am trying to do all, a sequence of joining two lists (or tables) on a common key - first set me up (unions ), But before I start learning from these depths, is there a way to go? So far I thought it could be implemented:
-
Make a list of dictionaries and repeat on the list every time the key, however, this is the worst case scenario. It can run to run (InputDict) * len (outputDict) & lt; - Not sure?
-
Using data as an in-memory SQLite table?
- Sorting lists first (I do not know how
- Generate a dictionary with the transaction key as the key then store the user and activation code as values in the list
Explanation
The entire purpose of this script is to summarize, the actual data sets are coming from two different sources, Transaction numbers are coming from a performance test as a output CSV, which is testing email activation code throughput. The second dataset comes from parse of test mailboxes, which include transaction ID and activation code. The output of this test is then a CSV that will pump back to stage 2 of the Performance Test, activating user accounts using activation codes to be activated.
If my marking for the record was misleading, I updated them accordingly.
Thanks for the answer, I am going to give two ideas a try:
The display isn 'too much for me , I just want to try good habits with my Python programming.
This is a revolutionary approach.
No.
You have two CSV files; One (user) is clearly a driver, leave alone another - Transaction code for a user - can be converted to a simple dictionary.
When necessary Do not "combine" or "include." Do not make sure to "merge" or "pre-join."
Write your application Simple search in another collection May include.
or repeated on dictionaries Create a list of inventory compared to each key,
Stop it looks like this. Note: Not Sort Act_data: rdr = csv.DictReader (act_data) activations = dict (Open with the line ['user'], line) ('activations.csv', 'rb') with
Import with CSV ORP ('users_2.csv', 'wb') Update_data: Open as wtr = csv.DictWriter as user_data: rdr = csv.DictReader (user_data) ('users. ['Some_field'] = activation for the user [user ['user_id_column']] ['some_field'] wtr.writerow (updated_data, ['some', 'list', ' ',' Column ']) (user)
This is fast and easy dictionary Save (use shelves
or pickle
.).
However, this is the worst case scenario, ) * LEN (Output Dict) & lt; - Not sure?
False
There is a list of "Driving" list, the second lookup list is Users will re-drive through and will see the appropriate values for the transaction. This is O ( n ) on the list of users. Lookup is O (1) because the dictionaries are hash
Comments
Post a Comment