Geek Igor

A blog on software development

by Igor Kupczyński


» Elastic Cloud Meetup

25 May 2016 » ElasticNL meetup in Nieuwegein

I had a great opportunity to talk about Elastic Cloud architecture together with Morten Ingebrigtsen. This was an ElasticNL meetup. This time in a new venue at Nieuwegein.

| Read more

» Chamberconf 2016

07 Mar 2016 » The only fixed item on the schedule is the time of the diner!

I’ve just came back from Chamberconf 2016. It is quite an unusual conference on the Polish Java scene. It is small — around 80 people total. It is held in a palace, this year in Pałac Łagów. But most importantly, there are no sponsors and the form is not very strict.

| Read more

» You Know, for Search...

06 Mar 2016 » Year 2016 brings new challenges. I joined Elastic to work on the Cloud team

I’ve talked and blogged about elasticsearch quite a few times last year. At Egnyte, we’ve built a search engine which allowed us to offer great search experience to Egnyte’s customers. It was based elasticsearch and we’ve successfully scaled it up to billions of documents and dozens of terabytes of data. I had a lot of fun learning elasticsearch and working through proof-of-concept, implementation and then production deployment and scaling.

| Read more

» Elasticsearch in Production

05 May 2015 » My talk at Poznan Java User Group

Elasticsearch in Production, Lessons Learned from Deploying and Managing Multi Billion Document Search Engine at Egnyte. I gave this talk on our search engine at Poznan Java User Group. You can find the slides here:

| Read more

» Elasticsearch Caches - Fielddata

06 Apr 2015 » Exploring elasticsearch caches and memory usage

UPDATE this blogpost was written for elasticsearch 1.6 and it is no longer up-to-date.

| Read more

» Elasticsearch Workshop at Poznań University of Technology

26 Mar 2015 » Career Fairs at PUT, why not do distributed systems for a living

I’ve conducted an intro workshop to Elasticsearch at Poznań University of Technology today. I was there because of Career Fairs where my employer wanted to attract young talents and make our brand more recognizable among students and graduates of technical majors.

| Read more

» Make Me a Cluster

23 Mar 2015 » Easily provision an Elasticsearch cluster for fun, experimentation and profit

If you followed my blog you may known than I’m responsible for changing my employer’s in-house search appliance to a new one backed by Elasticsearch. We had a huge data volume, even though we are not fully migrated to Elasticsearch yet. During our day to day operations we see a lot of interesting cases. I think we reached that point where reading documentation, news groups and blogs is simply not enough to explain everything we see in the trenches. I often feel a need to experiment with an Elasticsearch cluster.

| Read more

» Profiling Python Scripts

16 Jan 2015 » Run snake, run! How to profile python scripts with ease.

I have a python script which is responsible for a long running migration. The script communicates with three other systems - it reads some data from systems #1 and #2, merges them and then pushes them to the system #3. This is depicted below. The problem was that the migration run at a pace I wasn’t happy with. Since most of the work the script does it communication with external systems I wanted to know which is slow. Python has a great built-in profiler to answer this kind of questions. Follow this post to learn on how to use it.

| Read more

» Delaying Proxy in NodeJS

08 Jan 2015 » It is about time to add a new tool to your toolbox

Tags: nodejs

During the Christmas holiday season I had some time to learn NodeJS. In this post I’ll show how to create a simple (but useful!) proxy server in less than 30 lines of code. I’ll also give some hints on how to start with JavaScript with no prior experience.

| Read more

» How do High Frequency Traders Make Their Money?

06 Dec 2014 » Michael Lewis offers a good story plus a layman introduction to high-frequency trading

High Frequency Trading is quite a hot topic nowadays. It is a form of algorithmic trading where you gain an advantage by being faster than the other market participants. But how do high frequency traders make their money? The book Flash Boys by Michael Lewis, that I’ve recently read, is a good introduction to the topic. In this post I’ll summarize what I’ve learned from it.

| Read more

» Accessing Elasticsearch Cluster over https

26 Nov 2014 » Access your cluster securely in python or in java

Elasticsearch offers no security out-of-the box. The connections are over http or their native protocol, both unencrypted, for all the world to see. This is not a problem if your cluster is in the same datacenter as the rest of your infrastructure - safely behind a firewall. Quite often this is not the case. You can put the cluster in Amazon EC2 or Google Compute Engine or something similar. If you do this, then need to encrypt the connection. Arguably, the easiest solution is to use https. There are various ways to configure the cluster, usually through some third-party proxy like nginx (please see the discussion). Recently, we’ve decided to spin a new cluster in Google Compute Engine and to allow only https access. I was curious if my existing java and python client code will work out-of-the box.

| Read more

» Searching for Product Name in Elasticsearch

22 Oct 2014 » How to implement good search on product name in Elasticsearch

A lot of elasticsearch clusters will have a usecase of searching for product name. It doesn’t really matter if the products are consumer goods, articles or files. The important thing is that users what to search by product name and find matching items. The products should be found if a user types their exact name or just type something close enough. In the post I’ll describe a possible implementation of this usecase.

| Read more

» Elasticsearch Refresh

20 Oct 2014 » The problem with near-realtime search and *update-by-query*

In the startup I work for we’re evaluating the usage of elasticsearch to search for file metadata. Recently, when writing an integration test I’ve found an interesting bug, which I missed when designing the solution. We use the update-by-query plugin to bulk update docs. In test I wrote I added a document and immediately updated it. The thing is that the document was not being update. What happened? Read on…

| Read more

» Two Recently Discovered Emacs Plugins

30 Sep 2014 » Edit your gmail messages with emacs and make http requests from your favorite editor

I recently found two interesting emacs plusing which can be applied for many usecases.

| Read more

» Standing Desk at Home

31 Jul 2014 » Use a laptop tray to create a standing desk at home with (almost) no effort.

Looks like this standing desk is not a niche anymore. There are lot of articles on-line about its benefits and issues the people are facing (like this or this).

| Read more

» Host Gitlab on DigitalOcean

08 Jul 2014 » Building own github in the cloud

I recently migrated my private repos to digital ocean and gitlab. This post is a walk through on how to do it and summarizes my experience.

| Read more

» CAP Theorem and Elasticsearch

26 Jun 2014 » Will it fail together with the network?

Let us check the Elasticsearch behavior in context of the CAP theorem.

| Read more

» MapDB

20 May 2014 » Fast, off-heap java database with a collection interface

The problem statement is this - you have a set of text files. Each row in the file represents a row from the database. The rows can belong to one of two tables (they can represent one of two distinct entities). There is a parent-child, one-to-many relationship between the entities. The parent always comes before its children, but you do not know how many children there are left. You need to denormalize the parent data into child entities and dump the data back to a text file. See the dumbed-down example below.

| Read more

» Samsung Fail

24 Apr 2014 » How not to update firmware + solution

I own a Galaxy S3 smartphone, which I’m quite happy with. Android has its issues but the hardware is very good and samsung rolled a few apps to compensate for the OS shortcomings. However today I was really disappointed. They rolled out a new update. Updated 2014-05-06 with a solution.

| Read more

» Near real-time analytics

15 Apr 2014 » Big data = big fail?

Tags: big data fail

Martin Fowler recently summarized a “Reporting Database” pattern. The core idea behind this pattern is to split analytical and transactional processing by providing a separate database for the former. A separate database for reporting purposes is nothing new and this pattern is widespread in the enterprise space. What caught my interest though was his opinion on near real-time analytics.

| Read more

» Java 8 Released

30 Mar 2014 » Whats new in Java 8? Where to start learning?

Tags: java

Last week we saw the new release of Java. Version 8 brings a lot of new features into the language. I think the most important are: - lambda expressions, - default method implementations in interfaces, - new datetime api.

| Read more

» Project Lombok

20 Mar 2014 » Never write your getters again!

My college recently introduced Project Lombok to our java code base. This is a great tool, which aims to make java source code less verbose by using a compile time annotation processor. In this post I give you a basic example of usage of Lombok and a description on how to configure IntelliJ Idea to use it.

| Read more

» A Note on MongoDB for Java Developers Course

28 Feb 2014 » A study of a scandal with MongoDB

The on-line course - MongoDB for Java Developers - I’d attended recently is finished. According to the course dashboard

| Read more

» Second Week of Mongo M101J Course

19 Jan 2014 » First impressions on MongoDB and M101J course

I finished the second week of the MOOC MongoDB for Java Developers which I advertised in my latest post. I really like the course and I think its worth to spend some time learning Mongo.

| Read more

» M101J - MongoDB for Java Developers

06 Jan 2014 » Free MongodDB course for Java Developers starts today.

10gen, the company behind MongoDB offers a free on-line course, which ends with a certificate. It’s a great opportunity if you want to get quickly up to speed with MongoDB. The course starts today and you can still sign-in.

| Read more