Inserting millions of rows into MongoDB

Roshan Paiva
2 min readOct 5, 2017

Context:

We have a service that post processes IoT device logs and derives intelligence that is then fed back into our backend system to make it better. The intelligence is stored into MongoDb. The system is built on NodeJS and is running on AWS. The logs are sent to the system via RabbitMQ. A device log can contain anywhere from 1K to 1M lines of logs. Similarly, there are over 1M devices in the field.

This system tested well and ran well in stage. However, while running this system in production we noticed that periodically the node processes would hang — causing a build up in the queues.

The hacky way to get over this was to reboot the servers. This would cause it to work temporarily until it hung again.

The investigation:

  • We dived into the logs and saw this is:
←- Last few GCs — ->[2026:0x2e34aa0] 76668 ms: Mark-sweep 1408.3 (1470.4) -> 1408.2 (1439.4) MB, 2701.6 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 2702 ms) last resort
[2026:0x2e34aa0] 79269 ms: Mark-sweep 1408.2 (1439.4) -> 1408.1 (1439.4) MB, 2601.1 / 0.0 ms last resort
←- JS stacktrace — ->==== JS stack trace =========================================Security context: 0x89c4c913471 <JS Object>
1: new constructor(aka InternalCache) [/srv/www/adp/current/node_modules/mongoose/lib/internal.js:~10] [pc=0x10aefcd63746](this=0x3e58bd884399 <an InternalCache with map 0x37c6e526b501>)
3: constructor(aka Document) [/srv/www/adp/current/node_modules/mongoose/lib/document.js:~44] [pc=0x10aefc9b3809](this=0x3e58bd884321 <a model with map 0x37c6e526eaf9>,obj=0x10f4144bf4d1 …
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed — JavaScript heap out of memory
1: node::Abort() [node]
2: 0x12c6a1c [node]
3: v8::Utils::ReportOOMFailure(char const*, bool) [node]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
5: v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [node]
6: v8::internal::Runtime_AllocateInTargetSpace(int, v8::internal::Object**, v8::internal::Isolate*) [node]
7: 0x10aefc30ea5f
  • Another thing we noticed was that this occurred right after it processed a log file and had at least 250K rows of processed logs
  • It also seemed to hang while writing to Mongo

The above gave us a good hint that the problem was how we were using mongoose/Mongo. So we took a look at how we were inserting the processed rows into MongoDb.

Turns out we were inserting rows using Model.create(<collection>). While this tested well using small loads, this is not the recommended approach for storing large data sets.

There are two other methods collections.insert() or Model.insertMany(). We went with Model.insertMany() which batches operations into groups of 1000. So far it seems to be working with no hiccups.

Lessons Learnt:

  • Add scenarios to run tests with load. We never uncovered this in our tests and in stage because we didn’t test it with larger log files — we focused more on functionality rather than striking a balance between functionality, scale and performance.
  • Don’t pick the first function that satisfies your requirement. Spend time to understand all the features and functions exposed by the driver and library.

Ref: http://guyharrison.squarespace.com/blog/2016/11/7/bulk-inserts-in-mongodb.html

--

--

Roshan Paiva

Engineering at @atlassian. Previously @cirrent and @docusign. @uclcs alumnus, budding photographer, sports enthusiast. Love solving problems