Wednesday, August 28, 2013

GSoC Weeks 8 & 9

I've spent the last couple of weeks responding to code reviews and cleaning up my commit history. This included a significant consolidation of the #1382 patch, which can be found here. The patch is being reviewed and we have decided that it will be included in Tahoe 1.11.0. While users won't see much of a difference from the patch, file reliability will increase significantly.

We've also decided to include #1057 in 1.11.0, which will apply the servers-of-happiness test to mutable uploads. This will be a great win for end users because they won't be confused by the conflicting behavior of mutable and immutable files.

Along the same lines as #1382 and #1057, I sent some time trying to reproduce #1830, which is an issue with happiness settings. Users have reported instances in which their happiness settings are ignored and the uploader will use the default value of 7 instead. Sadly I couldn't reproduce the ticket with a unit test on trunk and I didn't find anything that looked suspicious when reading through the codebase. My unit test can be viewed on this branch. The ticket needs to be investigated some more before it is closed and if anyone is having this issue, please post something on trac so we can properly reproduce the bug.

I also wrote a patch for #671, which is to bring back the size limit option for storage nodes. This is something I have wanted ever since I started using tahoe, so it was nice to be able to scratch that itch. The patch relies on the new LeaseDB branch in order to calculate the amount of data stored on the given node, so the patch won't land until 1.12 because of a significant performance regression in LeaseDB. However, I have been testing the branch in production and it has unit tests, so if you are in need of this feature you should be okay using it.

Finally, I've spent some time improving the buildbot configuration for automated testing. My first improvement was to create a CouchDB instance at IrisCouch so that test results from the buildslaves will be uploaded to a central database. This data will be useful for tracking possible performance regressions over longer periods of time than one or two patches.




Friday, August 9, 2013

GSoC Update - Week 7

For the past couple of weeks I have been writing patches for servers-of-happiness and issues in the command line interface (cli).

  • Ticket #1057 (Github) - Alter mutable files to use servers-of-happiness: Tahoe uses the servers-of-happiness measurement for immutable files, but not mutable files. Instead mutable files use the old shares-of-happiness test, which has no concern for share distribution over multiple servers. This behavior is confusing for users because mutable uploads can succeed when it is impossible to upload immutable files. Closing this ticket will improve tahoe's usability as well as increase file redundancy for successful uploads. It is also required before I can implement the upload strategy of happiness for mutable files.
  • Ticket #2034 (Github) - Mutable file upload is sensitive to the number of servers: When developing a patch for #1057, I found a small issue in tahoe's mutable upload process. If the number of servers on the grid is greater than k + N, the query process can end prematurely due to a race condition and this will cause tahoe to incorrectly report a file unhealthy. The race condition is unlikely to appear in production because it requires little to no lag between the server query and the response, but it causes issues when testing servers of happiness with mutable files.
  • Ticket #2027 (Github) - Inconsistent "tahoe cp" behavior: In tahoe 1.10, there is a small bug in the cli that prevents users from copying a file off of the grid without specifying a file name. The bug is caused by a unicode assertion, so my patch simply converts the destination to unicode.
  • Ticket #712 (Github) - Tahoe cp -r doesn't copy the parent: When using the cli, "tahoe cp" is supposed to mimic the behavior of "cp". However, when tahoe copies a directory recursively, it does not copy the parent directory like cp does. My patches fixes this small issue by creating a different target directory for the cli.
  • Ticket #1836 (Github) - Use leasedb for share count: Right now tahoe uses a crawler to keep track of share information. This is ineffective, especially for large servers, so there is a branch in development to keep track of share information in a sqlite database. My patch removes the crawler and makes the necessary sql queries instead. Currently the branch cannot be merged with trunk because of a significant performance regression, but the core dev team has slowly been finding the bugs. I'm interested in this branch because I would really like tahoe to have a maximum size limit again.
I was also able to finish the new IRC bot for #tahoe-lafs on freenode during my free time and it has been working out great. The new bot will parse multiple ticket numbers and it will post updates from trac. You can find the source code here, and I've spun the IRC code out into another package that you can find here.

Friday, July 19, 2013

GSoC Update (Week 5)

After a few days of struggling with one test in particular, my patch to Tahoe's upload algorithm passes all tests! It is waiting to be reviewed and will hopefully be merged with trunk in a couple of weeks. Once my patch has been reviewed, I plan on implementing a similar upload algorithm to be used with mutable files. While I could move onto a different aspect of Tahoe, continuing to diverge the behavior of immutable and mutable files will only create more trouble for users. Also it is best to write the patch when the information is fresh in my mind.

In the meantime, I have been working on redefining Tahoe's measure of file health to use servers-of-happiness. A patch for immutable files is already finished and can be viewed here. Right now some of the tests don't pass, but I have decided to put the patch on hold until mutable files use servers-of-happiness during upload. This is because it's inconstant to upload a file distribution that the checker will deem unhealthy.

Today I plan to start working on some minor tickets associated with Tahoe's new cloud backend system, which allows users to use popular cloud storage services as storage nodes. It also makes some existing operations faster for regular storage nodes due to some architecturale changes that were necessary in order to get cloud backends to work.

In my free time I have been writing a new IRC bot for the #tahoe-lafs channel on freenode. The current bot is nice because it will display ticket information when someone refers to a ticket by number (ex. #1382). However, the bot will only parse one ticket per line, which is annoying at times. Since I could not find the source code for the existing bot, I decided to write a new one. Right now my bot successfully parses multiple tickets, and my plan is to integrate trac updates through rss. I'd also like to add interaction with github, such as knowing when a new pull request is opened, since there is some disconnect between the activity on trac and the activity on github. My plan is to open source the bot once I clean up my code.

Tuesday, July 2, 2013

GSoC Update (Week 3)

Last week I was able to clean up most of my code and finish implementing the new algorithm into the current uploader. All of the existing tests pass except for one, which needs to be rewritten to reflect the new behavior of the uploader. I've tested the new upload algorithm over the test grid and everything seems to work well in production. I'm excited to work with my mentors to clean up the code so that other users in the community can test it out!

Hopefully my project will be finished in a week or so and I will be able to start hacking away at a new project for the Tahoe-LAFS community.

Monday, June 24, 2013

Upload Strategy of Happiness

As I mentioned in the previous post, my GSoC project is to implement a new upload algorithm in Tahoe, called the "Upload Strategy of Happiness". This algorithm was developed by Kevan Carstensen, one of Tahoe's contributors, in order to meet the servers of happiness test by maximizing the distribution of file shares over a network. The relationship between shares and servers is modeled using a bipartite graph, where an edge exists from share n to server s if and only if s stores n.

The algorithm is as follows:

Let N be the set of all shares and S be the set of all servers.

  1. Construct a bipartite graph A from the existing allocations on the network such that an edge exists from share n to server s if and only if s stores n. 
  2. Calculate the maximum matching graph of A using the Edmonds-Karp algorithm and let this graph be A-max. 
  3. Let N-existing be the set of all shares in A-max and let S-existing be the set of all servers in A-max. Construct the bipartite graph B such that the set of shares used by B is N - N-existing and the set of servers used by B is S - S-existing where an edge from each share in N - N-existing exists to each server in S - S-existing if server s can hold share n. 
  4. Calculate the maximum matching graph of B and let this graph be B-max.
  5. Merge A-max and B-max.
The product of merging A-max and B-max is the maximum distribution of shares over unique servers on the network. This technique of calculating the maximum distribution is useful because it utilizes as many existing shares as possible, which reduces the amount of data transferred. Once the shares have been distributed over the maximum number of unique servers, the remaining shares are distributed as evenly as possible by looping over the set of servers until there are no more shares to distribute. Ultimately this will solve edge cases in Tahoe that prevented certain server configurations from accepting new files.

Note: If you are unfamiliar with the maximum matching of bipartite graphs but interested in learning more, I highly recommend chapter 26 of Introduction to Algorithms, titled Maximum Flow.

Last week I was able to implement the algorithm and start refactoring the current uploader. While the new uploader isn't finished yet, some of the test cases which used to fail now succeed, which is pretty exciting. Hopefully I will be able to clean up my code and work out some of the bugs this week. To view my progress checkout the Github branch here

Monday, June 17, 2013

File Upload in Tahoe-LAFS

Introduction

As a brief introduction, my name is Mark J. Berger and I'm a student at Stanford University studying computer science. In conjunction with the Google's Summer of Code (GSoC) program, I am improving Tahoe-LAFS on behalf of the Python Software Foundation. This blog will be dedicated to my work on Tahoe-LAFS, and all other things GSoC related. If you want to contact me or check out some of my personal projects, visit markjberger.com

Tahoe-LAFS & Servers of Happiness

About Tahoe-LAFS

Tahoe-LAFS is an open source cloud-based storage system which distributes user data over multiple servers, similar to Dropbox and other cloud-based storage services. However, unlike other services, Tahoe is built with provider-independent security. All user data is encrypted in such a way that the service provider cannot access it, guaranteeing that the user's privacy is respected. For more information on what Tahoe-LAFS is and how it works, check out the about page on the Tahoe-LAFS website here.

Servers of Happiness

To accomplish data redundancy, Tahoe uses erasure encoding to split the file into n pieces where only k pieces are required to reassemble the file. By default, Tahoe sets the encoding parameters to be n = 10 and k = 3 so that any file can be reassembled with three pieces even though ten pieces exist on the network.

While erasure encoding is great for redundancy purposes, it can be misleading. If all ten pieces of a given file are on the same server, data redundancy hasn't increased significantly. In the situation of server failure, the data is no better off in ten pieces compared to a single file on the server. Erasure encoding has only increased the size of the file without adding any system wide benefits.

To mitigate this situation, Tahoe uses a criteria called "Servers of Happiness", which introduces a new parameter h. Tahoe ensures that a subset of n shares are placed on h distinct servers, therefore guaranteeing increased redundancy. However, Servers of Happiness is only a test, and the current upload algorithm was not built to pass the new test. Therefore, problems arise in which the Servers of Happiness test can be met, but the uploader does not know how to do so, leading to upload failure. My project aims to fix this issue, and other issues which arise from the outdated uploader, by implemented an "Upload Strategy of Happiness" which knows how to meet the Servers of Happiness criteria. This will increase overall data redundancy,  as well as remove inefficiencies in the uploader. 

Today is the first day of programming in GSoC and I've just started to refactor the uploader to implement the Upload Strategy of Happiness. In my next post I hope to explain the new strategy I will be implementing, and discuss any progress I have made this week. To follow my progress subscribe to this blog, or follow me on Github and checkout the project branch here.