by Igor Kupczyński
25 May 2016 » ElasticNL meetup in Nieuwegein
07 Mar 2016 » The only fixed item on the schedule is the time of the diner!
I’ve just came back from Chamberconf 2016. It is quite an unusual conference on the Polish Java scene. It is small — around 80 people total. It is held in a palace, this year in Pałac Łagów. But most importantly, there are no sponsors and the form is not very strict.
06 Mar 2016 » Year 2016 brings new challenges. I joined Elastic to work on the Cloud team
I’ve talked and blogged about elasticsearch quite a few times last year. At Egnyte, we’ve built a search engine which allowed us to offer great search experience to Egnyte’s customers. It was based elasticsearch and we’ve successfully scaled it up to billions of documents and dozens of terabytes of data. I had a lot of fun learning elasticsearch and working through proof-of-concept, implementation and then production deployment and scaling.
05 May 2015 » My talk at Poznan Java User Group
Elasticsearch in Production, Lessons Learned from Deploying and Managing Multi Billion Document Search Engine at Egnyte. I gave this talk on our search engine at Poznan Java User Group. You can find the slides here: http://igor.kupczynski.info/t/2015-05-05-jug-elasticsearch-in-production/slides.html.
06 Apr 2015 » Exploring elasticsearch caches and memory usage
UPDATE this blogpost was written for elasticsearch 1.6 and it is no longer up-to-date.
26 Mar 2015 » Career Fairs at PUT, why not do distributed systems for a living
I’ve conducted an intro workshop to Elasticsearch at Poznań University of Technology today. I was there because of Career Fairs where my employer wanted to attract young talents and make our brand more recognizable among students and graduates of technical majors.
23 Mar 2015 » Easily provision an Elasticsearch cluster for fun, experimentation and profit
If you followed my blog you may known than I’m responsible for changing my employer’s in-house search appliance to a new one backed by Elasticsearch. We had a huge data volume, even though we are not fully migrated to Elasticsearch yet. During our day to day operations we see a lot of interesting cases. I think we reached that point where reading documentation, news groups and blogs is simply not enough to explain everything we see in the trenches. I often feel a need to experiment with an Elasticsearch cluster.
16 Jan 2015 » Run snake, run! How to profile python scripts with ease.
I have a python script which is responsible for a long running migration. The script communicates with three other systems - it reads some data from systems #1 and #2, merges them and then pushes them to the system #3. This is depicted below. The problem was that the migration run at a pace I wasn’t happy with. Since most of the work the script does it communication with external systems I wanted to know which is slow. Python has a great built-in profiler to answer this kind of questions. Follow this post to learn on how to use it.
08 Jan 2015 » It is about time to add a new tool to your toolbox
06 Dec 2014 » Michael Lewis offers a good story plus a layman introduction to high-frequency trading
High Frequency Trading is quite a hot topic nowadays. It is a form of algorithmic trading where you gain an advantage by being faster than the other market participants. But how do high frequency traders make their money? The book Flash Boys by Michael Lewis, that I’ve recently read, is a good introduction to the topic. In this post I’ll summarize what I’ve learned from it.
26 Nov 2014 » Access your cluster securely in python or in java
Elasticsearch offers no security out-of-the box. The connections are over http or their native protocol, both unencrypted, for all the world to see. This is not a problem if your cluster is in the same datacenter as the rest of your infrastructure - safely behind a firewall. Quite often this is not the case. You can put the cluster in Amazon EC2 or Google Compute Engine or something similar. If you do this, then need to encrypt the connection. Arguably, the easiest solution is to use https. There are various ways to configure the cluster, usually through some third-party proxy like nginx (please see the discussion). Recently, we’ve decided to spin a new cluster in Google Compute Engine and to allow only https access. I was curious if my existing java and python client code will work out-of-the box.
22 Oct 2014 » How to implement good search on product name in Elasticsearch
A lot of elasticsearch clusters will have a usecase of searching for product name. It doesn’t really matter if the products are consumer goods, articles or files. The important thing is that users what to search by product name and find matching items. The products should be found if a user types their exact name or just type something close enough. In the post I’ll describe a possible implementation of this usecase.
20 Oct 2014 » The problem with near-realtime search and *update-by-query*
In the startup I work for we’re evaluating the usage of elasticsearch to search for file metadata. Recently, when writing an integration test I’ve found an interesting bug, which I missed when designing the solution. We use the update-by-query plugin to bulk update docs. In test I wrote I added a document and immediately updated it. The thing is that the document was not being update. What happened? Read on…
30 Sep 2014 » Edit your gmail messages with emacs and make http requests from your favorite editor
I recently found two interesting emacs plusing which can be applied for many usecases.
31 Jul 2014 » Use a laptop tray to create a standing desk at home with (almost) no effort.
08 Jul 2014 » Building own github in the cloud
I recently migrated my private repos to digital ocean and gitlab. This post is a walk through on how to do it and summarizes my experience.
26 Jun 2014 » Will it fail together with the network?
Let us check the Elasticsearch behavior in context of the CAP theorem.
20 May 2014 » Fast, off-heap java database with a collection interface
The problem statement is this - you have a set of text files. Each row in the file represents a row from the database. The rows can belong to one of two tables (they can represent one of two distinct entities). There is a parent-child, one-to-many relationship between the entities. The parent always comes before its children, but you do not know how many children there are left. You need to denormalize the parent data into child entities and dump the data back to a text file. See the dumbed-down example below.
24 Apr 2014 » How not to update firmware + solution
I own a Galaxy S3 smartphone, which I’m quite happy with. Android has its issues but the hardware is very good and samsung rolled a few apps to compensate for the OS shortcomings. However today I was really disappointed. They rolled out a new update. Updated 2014-05-06 with a solution.
15 Apr 2014 » Big data = big fail?
Martin Fowler recently summarized a “Reporting Database” pattern. The core idea behind this pattern is to split analytical and transactional processing by providing a separate database for the former. A separate database for reporting purposes is nothing new and this pattern is widespread in the enterprise space. What caught my interest though was his opinion on near real-time analytics.
30 Mar 2014 » Whats new in Java 8? Where to start learning?
Last week we saw the new release of Java. Version 8 brings a lot of new features into the language. I think the most important are: - lambda expressions, - default method implementations in interfaces, - new datetime api.
20 Mar 2014 » Never write your getters again!
My college recently introduced Project Lombok to our java code base. This is a great tool, which aims to make java source code less verbose by using a compile time annotation processor. In this post I give you a basic example of usage of Lombok and a description on how to configure IntelliJ Idea to use it.
28 Feb 2014 » A study of a scandal with MongoDB
The on-line course - MongoDB for Java Developers - I’d attended recently is finished. According to the course dashboard
19 Jan 2014 » First impressions on MongoDB and M101J course
I finished the second week of the MOOC MongoDB for Java Developers which I advertised in my latest post. I really like the course and I think its worth to spend some time learning Mongo.
06 Jan 2014 » Free MongodDB course for Java Developers starts today.
10gen, the company behind MongoDB offers a free on-line course, which ends with a certificate. It’s a great opportunity if you want to get quickly up to speed with MongoDB. The course starts today and you can still sign-in.