|Welcome to issue 44 of NoSQL Weekly. I am planning to add testimonials on NoSQLWeekly.com. If you love NoSQL Weekly, could you reply to this e-mail with a sentence or two as a testimonial with your full name and link to your site that I can use?
(Powered by LaunchBit)
Justin Cutroni is the guy who wrote the book on Google Analytics. He gave up his secrets in this video that normally goes for $99, but for today it'll be FREE just for NoSQL Weekly members. -- Get the Google Analytics video here
10gen Releases Cloud-Based Monitoring and Alerting Solution for MongoDB
10gen, the company behind MongoDB, has announced the general availability of MongoDB Monitoring Service (MMS), a new monitoring service that is available free of charge to all MongoDB users. MMS provides visibility into current and historical operations to facilitate proactive alerts, support and response on system performance and availability.
Cloudera Integrates Hadoop Distro with R Analytics
Revolution Analytics has announced that it will integrate Cloudera's Distribution with Apache Hadoop (CDH3) with its Revolution R Enterprise platform to create a joint offer called "RevoConnectR for Apache Hadoop." The product is meant to enable R developers to access Hadoop data stores and write MapReduce jobs directly with R.
Articles, Tutorials and Talks
A Graph-Based Movie Recommender Engine
This post demonstrates how to build a graph-based movie recommender engine using the publicly available MovieLens dataset, the graph database Neo4j, and the graph traversal language Gremlin.
Spring + MongoDB: Type Safe Queries
This blog post walks you through how you can use QueryDSL to provide type-safe queries when using Spring and MongoDB.
A triangle exists when a vertex has two adjacent vertexes that are also adjacent to each other. Using friendship as an example: If two of your friends are also friends with each other, then the three of you form a friendship triangle. This concept is useful for understanding social networks and graph analysis in general (e.g. it can be used to compute the clustering coefficient of a graph). This post compares the solution of counting triangles using Hadoop, PIG, and Vertica.
Wiki PageRank with Hadoop
This tutorial shows how to create a PageRanking for Wikipedia with the use of Hadoop. The Wikipedia (en) has 3.7M articles at the moment and is still growing. Each article has many links to other articles. With those incomming and outgoing links we can determine which page is more important than others, which basically is what PageRanking does.
Beating Google With CouchDB, Celery and Whoosh (Part 1)
This is the first post in a series of posts that will show you how to create a search engine using standard Python tools like Django, Celery and Whoosh with CouchDB as the backend.
Hadoop for Archiving Email
This post explores a specific use case for Apache Hadoop, one that is not commonly recognized, but is gaining interest behind the scenes. It has to do with converting, storing, and searching email messages using the Hadoop platform for archival purposes.
Nokia: Lessons Learnt Migrating a Very Large and Highly Relational Database into a "Classic" NoSQL
Enda Farrell discusses how they ported Nokia's places registry to NoSQL, the reasons, the complexity involved and the lessons learned along the way in terms of people, tools and data.
Castle: Re-inventing storage for Big Data
Castle, an open-source project, is a ground-up overhauling of RAID, file systems, and the POSIX interface. The target is 1 million random inserts per second to disk on a $1,000 commodity box, and we're nearly there. Castle is also the core of the Acunu Data Platform, which delivers up to 100x higher performance for applications written for Cassandra and other tools.
Optimizing your CouchDB Calls by 99%
In this talk, Tim Anglade explains the best practices that he has observed, tried and implemented, from a large pool of examples that includes the customer apps he has helped optimize and debug, as well as his own.
Fine-Tune Your Apache Hadoop Security Settings
The article covers some of the features of the Apache Hadoop security infrastructure that will help cluster administrators fine-tune the security settings of their clusters.
Scaling with RavenDB
In this webcast, Oren Eini and Nick VanMatre, Solutions Architect at Archstone, sit down to discuss the scaling options for Archstone's newest project, a re-architecture of their internal and external apartment-management applications. Discussed are the options for scaling RavenDB, including sharding, replication and multi-master setups.
Scaling with Riak at Showyou
This is a presentation on how Showyou uses the Riak datastore at Showyou.com, as well as work they have been doing on a custom Riak backend for search and analytics.
Getting Started with MMS
This post shows you how to get started with MMS(MongoDB Monitoring Service), which is a free hosted monitoring service for MongoDB.
How Tapad Uses Different DBs for Different Needs
In this interview, Dag Liodden, VP of Engineering at Tapad talks about their NoSQL strategy and how they optimize their platform with different types of data stores for their various needs.
Machine Learning and Hadoop
Building a Real-Time Location-Based Urban Geofencing Game with Socket.io, Redis, Node.js and Sinatra Synchrony
Replacing RabbitMQ with MongoDB
Three Use Cases For Riak
Interesting Projects, Tools and Libraries
Monotable aims to provide a reliable distributed key-value data store, intended primarily for storing large numbers of small files. Monotable is implemented in a combination of Ruby and C.
DataFu is a collection of user-defined functions for working with large-scale data in Hadoop and Pig. It is used at LinkedIn in many of our off-line workflows for data derived products like "People You May Know" and "Skills".
Professor, a MongoDB Profile Viewer
Professor is a web application with corresponding command-line tool to read, summarize, and interpret MongoDB profiler output (for MongoDB 2.0 and later).
Cherrys is a Redis backend for CherryPy sessions.
The features of Riak 1.0 Include:
Secondary Indices - allows a developer to tag Riak objects for indexing, and query the index to retrieve a list of matching keys
Riak Pipe - a new feature for higher-latency data processing; a new take on Map/Reduce style data processing
Integration of Riak Search - the powerful search engine built for Riak is now tightly integrated with the core 1.0 package
Lager - a new, simple and effective logging framework for Riak 1.0
LevelDB Support - Riak 1.0 includes available support for the LevelDB storage engine, further increasing user choice in deploying Riak
Administration Improvements - new tools make it easier to scale, manage and access a Riak cluster for developers and administrators
Upcoming Events and Webinars
Introduction to MongoDB and PHP
This webinar is an introduction to using MongoDB with PHP. The presenter will demonstrate how to connect to the database, perform CRUD operations and perform queries. Finally, he will summarize the community tools and libraries available for PHP and discuss why one would use them.
Introduction to Platform MapReduce
The presenter will introduce the Platform MapReduce architecture and enterprise-required features such as High Availability support. The discussions will cover development and IT topics including integration and advanced management of Hadoop programs. The session will include interactive demonstrations of the platform.
Mongo Boston is an annual one-day conference dedicated to MongoDB
Hadoop User Group UK
October Hadoop meetup has three great talks this time around the theme of data integration.
Informatica - Data integration and transformation
Flume for data loading into HDFS / Hive (Songkick)
Data Management for Hadoop
Mongo Munich 2011
Mongo Munich is a one-day conference dedicated to MongoDB.
LA-HUG - Getting Started with MapReduce: Hadoop For Beginners
We'll have a high-level overview of the Hadoop ecosystem and what it's used for, and the companies involved in LA-HUG will have an opportunity to present the ways they leverage Hadoop for real-world applications.
Hers is a list of the new books that are published this month.
MongoDB and Python: Patterns and processes for the popular document-oriented database
Learn how to leverage MongoDB with your Python applications, using the hands-on recipes in this book. You get complete code samples for tasks such as making fast geo queries for location-based apps, efficiently indexing your user documents for social-graph lookups, and many other scenarios.
This comprehensive hands-on guide presents fundamental concepts and practical solutions for getting you ready to use NoSQL databases. The author begins with a helpful introduction on the subject of NoSQL, explains its characteristics and typical uses, and looks at where it fits in the application stack. Unique insights help you choose which NoSQL solutions are best for solving your specific data storage needs.
HBase: The Definitive Guide
If you're looking for a scalable storage solution to accommodate a virtually endless amount of data, this book shows you how Apache HBase can fulfill your needs. As the open source implementation of Google's BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. This book provides meaningful answers, whether you’re evaluating this non-relational database or planning to put it into practice right away.
Big Data Glossary
To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment.
NoSQL Jobs of the Week
Backend Engineer at Chartbeat (New York, NY, United States)
Our traffic numbers are growing, and so is our list of feature/project ideas We are therefore looking for a backend engineer that can help with scaling, extending, and evolving our backend infrastructure to handle all this. The person we hire will play a very integral part in engineering. We expect the person to work on everything from architecture, implementation, infrastructure management, and firefighting -- just like the rest of us.
Sr. Search Engineer at Jive Software (Palo Alto, California, United State)
Motivated by big scale challenges? Have a passion for big data problems related to search? The Jive Big Data team is looking for experienced search engineers to build and improve big data processing pipelines and search indexing on top of Apache Hadoop and HBase. Our team works extensively with open source software on big data, social graph, and machine learning problems.
Server-Side Engineer at shopkick (Palo Alto, California, United States)