Thursday, December 13, 2012

Locked yourself out of your WNDR3800 on OpenWRT? Here's how you recover

Oops...

This week, I made a mistake when editing my network-bridge configuration on OpenWRT's LuCi web interface. After I pressed "Save and Apply", it occurred to me that things were taking quite a bit longer than usual. Then, I had no more internet connectivity from my backend machines, and I realized that I had made a mistake.

This was no cause for panic, since routers and firmwares usually have a recovery option: I would just look up how to do that on the intern... oh, wait... :(

Fortunately, I could enable WiFi tethering on my Android phone (that has a mobile data package), so I could use a laptop to look up the solution on the internet.

Solving the problem

Actually, the solution is quite easy: OpenWRT has a built-in recovery mode that you can enable by pressing the correct button at the correct time during the boot procedure. To that end, set up a backend machine to the static IP 192.168.1.2, and start a tcpdump:

# tcpdump -Ani eth0 port 4919 and udp

Now switch the router off and back on. After some 10-15 seconds, your tcpdump will show a message saying (amount of a lot of dots): "Please press button now to enter failsafe". At that point, on the WNDR3800, press the lowermost button (the one that is normally used for WPS auto-setup). The power LED will then start blinking very rapidly. Now, wait another ~30 seconds, and telnet into the router from your backend machine:

$ telnet 192.168.1.1

This will drop you straight to a root shell. You will want to remount the root filesystem read-write, on the OpenWRT:

# mount -o remount,rw /

You can now fix the problem (in the files under /etc/config) and reboot.

Tuesday, November 6, 2012

Multi-site private IPv6 networking using ULA and IPSEC

Wait, what?! Why?

So here is the situation: I have a home network behind a Netgear WNDR3800 router running OpenWRT, and I rent a remote server on which I run XEN with several VMs on a virtual backend network. Both sites have full IPv6 connectivity; all backend systems have a global IPv6 address, and although they are free to communicate with the entire (IPv6) world, I do have basic firewalling in place to allow new connections to some internal IPv6 hosts running OpenSSH only.

There also is the usual IPv4 NAT (NAT44) story on both backend networks, but this post is not about IPv4.

What I want is this: I want systems on both backend networks to be able to openly talk to each other over IPv6, yet in a secure way. In other words; to internal systems, I want a completely open and private IPv6 network.

Is that even possible?

Well, yes, it is! Here is how.

IPv6 Unique Local Addresses (ULA)

Even though IPv6 prefixes are usually stable, I do not want to depend on that when/if I switch providers. Fortunately, IPv6 was built from the ground up on the concept that an interface can have multiple IPv6 addresses.; two of them are the normal Link Local address (in fe80::/64) and the Global address (in 2000::/3), but it is possible to add more.

One of the possibilities that IPv6 offers is Unique Local Addresses; these are addresses in fd00::/8 (and fc00::/8, though that should not be used until there is a global registry) that one can use in the same way as one would have used the IPv4 private address space (10.0.0.0/8, 192.168.0.0/16, and friends). You can randomly generate a /48 in fd00::/8 by choosing 40 more bits, e.g. by running noise through sha256sum or something similar. Within this /48, you can create as many subdivisions as you want, though is it customary to create /64s, so that IPv6 autoconfiguration works on your clients.

The networks you create in fd00::/8 should not be routed outside your internal network, and nobody will willingly route these prefixes from the external world to you. However, internally, and between sites, you can route them seven ways from Sunday any way you want.

In the remainder of this post, I will describe how to set up ULA on both sites, how to connect both sites, and how to then make the inter-site connection secure.

Choosing a ULA and setting it up on both sites

The relevant RFC suggests that you generate 40 random bits using any sufficiently random method, e.g. by running some data from /dev/urandom through sha256sum, and copying the first 10 hex digits. Let us, for the sake of simplicity, assume the following ULA:

fd12:3456:789a::/48

This is of course very non-random, and you should not use it yourself, but it makes this post a bit easier to read.

Now that we have our ULA, we could pause for a second, and appreciate how unimaginably large that network that we just created is. This is a /48 prefix, which means that we have 128 - 48 = 80 bits of address space all to ourselves! It is customary to divide the space into 65,536 /64 networks, each of which will can hold ~2^64 unique addresses. Now that is a large number: 2^64 is about 2 * 10^19. That means that if we buy 20 million 1 TByte hard drives, we could assign a unique IPv6 address to each bit on each drive! And within our ULA, we could have 65,536 stacks of 20 million 1 Tbyte drives :-).

Anyway, in practice we will have fewer devices. Let's say that we choose obvious yet simple networks for both sites:

  • Site 1: Network fd12:3456:789a:1::/64 .
  • Site 2: Network fd12:3456:789a:2::/64 .
We will also need a network to connect both sites, but I will get to that later.

Setting up ULA on both sites consists of setting a (preferably simple) address for the router, and announcing the network prefix to other machines on the site network.

Setting up ULA on site 1

In my case, site 1 has a router running OpenWRT 10.03. Since the internal interface already has a static address for the Globally routable network (also a /64), and the LuCi web interface on the router does not allow me to add multiple IPv6 addresses on the lan interface, I define an alias in /etc/config/network:

config 'alias' 'lanula'
        option 'interface' 'lan'
        option 'proto' 'static'
        option 'ip6addr' 'fd12:3456:789a:1::1/64'

This will give the router the first available (::1) address within site 1's network. I then need to tell radvd to start announcing the prefix. That is done by editing /etc/config/radvd:

config 'prefix'
        option 'interface' 'lan'
        option 'AdvOnLink' '1'
        option 'AdvAutonomous' '1'
        list 'prefix' '2***:****:****::/64 fd12:3456:789a:1::/64'
        option 'ignore' '0'

Here, the starred-out 2***:****:****::/64 is my actual global prefix. Just add the ULA prefix on that same line. After rebooting the router, your clients will automatically obtain an address in both the global prefix and on the ULA prefix. In fact, if you use IPv6 privacy extentions (Linux does, usually), you will even get a temporary IPv6 address in both networks.

At this point, it is a good idea to ensure that you can ping6 fd12:3456:789a:1::1 from a client.

Setting up ULA on Site 2

In my case, the "router" on Site 2 is the dom0 domain of a Xen box that runs the other backend machines as domU domains. It too, already has full IPv6 connectivity; my server hoster routes a /64 to my dom0, which I then distribute to my domUs using radvd.

The dom0 in question is a vanilla Ubuntu Server release, so I can configure the interfaces in /etc/network/interfaces. However, since I can add only one IPv6 address (in addition to the link-local address) in there, I have to use the up/down logic to assign the ULA address.

iface ibr0 inet6 static
  address 2###:####:####:####::1
  netmask 64
  up /sbin/ifconfig ibr0 inet6 add fd12:3456:789a:2::1/64
  down /sbin/ifconfig ibr0 inet6 del fd12:3456:789a:2::1/64

Here, ibr0 is the internal backend bridge to which all my domUs are connected, and the hashed-out 2###:####:####:####::/64 is my Global address on the interface.

As on site 1, I have to configure radvd to advertise the prefix. To this end, I edit /etc/radvd.conf to include:

interface ibr0 { 
        AdvSendAdvert on;
        MinRtrAdvInterval 3; 
        MaxRtrAdvInterval 10;
        prefix 2###:####:####:####::/64 { 
                AdvOnLink on; 
                AdvAutonomous on; 
                AdvRouterAddr on; 
        };
        prefix fd12:3456:789a:2::/64 { 
                AdvOnLink on; 
                AdvAutonomous on; 
                AdvRouterAddr off; 
        };
};

At this point, both sites have a functioning network in the ULA range. Please do check that you can ping the router from client machines, as this is essential to getting the rest to work.

What does not yet work, is the connection between both sites; more on that later, but there is something else that needs to be taken care off: on both sites, firewall rules should be set up to neither send nor receive any ULA addresses on their external IPv6 interface; block the full fc00::/7 both coming in and going out. If we do not do this, a machine on site 1 trying to ping a machine on site 2 realizes that site 2 is outside its /64, and the router will try to route the message onto the public IPv6 net.

Connecting both sites

In order to connect both sites, really any mechanism that allows for sending IPv6 will do: one could set up an OpenVPN tunnel (with tap devices, as you need to be able to set IPv6 addresses on the interfaces), an ipv6-in-ipv4 tunnel, etc. In this case, though, I will try to not touch IPv4 at all, and I will use what is already there: I will use an ipv6-in-ipv6 tunnel between the sites' external IPv6 addresses, where the traffic inside the tunnel runs in the ULA space.

Fortunately, Linux supports such a setup out-of-the-box using its ip6ip6 mechanism on a tun device.

In the remainder of this example, I will use 2111:1111:1111:1111::1 as site 1's external address, and 2222:2222:2222:2222:2 as site 2's external address.

Inside the tunnel, we will use the new fd12:3456:789a:3::/64 network inside our ULA space.

Setting up the tunnel portal on site 1

Site 1 runs the OpenWRT router, which is a bit tricky in how you configure it. I did not find a good way to set up an ip6ip6 tunnel in the LuCi web interface, so I will include the command to do that in the additional startup script under System -> Startup. Before I do so, though, I will add a new interface called mytun, configure it as static, and set address fd12:3456:789a:3::1/64 on it. My (self-chosen) logic here is that the final part of the address is "1" since this is site 1's end of the tunnel.

Now go to Network -> Interfaces, and add a new "zone" called (e.g.) "tunnel", which includes the mytun interface. Set up firewall rules to allow all traffic inside our ULA in both directions, and also allow open routing between "tunnel" and your "lan" zone. Also go to Network -> Static Routes, and route both fd12:3456:789a:2::/64 and fd12:3456:789a:3::/64 onto the mytun device; we want to be able to reach both the other end of the tunnel and the network on the other side of the tunnel.

Finally, go to System -> Startup, and add the command that will set up the tunnel:

ip -6 tunnel add mytun mode ip6ip6 remote 2222:2222:2222:2222:2 local 2111:1111:1111:1111::1 dev eth1
ifconfig mytun mtu 1400

That final MTU setting requires some explanation: I do not really have native IPv6 on site 1; I have native (and dynamic) IPv4, and my IPv6 comes through an AICCU tunnel with SixXS. Now, by default, SixXS will set an MTU of 1280 bytes for you. This is a safe bet, but if is also the very minimum that IPv6 will accept (IPv4 had a 576-byte minimum). Now, if SixXS tunnel has a 1280-byte MTU, our ip6ip6 tunnel cannot have its minimum 1280-byte MTU, as some bytes are needed for the encapsulation message.

In my case (and after reading the SixXS documentation), it seems that their IPv6-in-IPv4 scheme has 20 bytes of encapsulation, so that I can use an MTU of 1480 bytes inside the Ethernet IP MTU of 1500 bytes. In the case of SixXS, I had to log into my account on their site, and I had to change the tunnel MTU from 1280 to 1480. AICCU then required a restart to pick up the new value. NOTE: The fact that I get IPv6 through a tunnel also means that I needed to substitute eth1 with sixxs.0 in the above command.

The ip6ip6 tunnel inside the SixXS IPv6-in-IPv4 tunnel must have a smaller MTU than the SixXS tunnel. I do not know exactly how much smaller, but 80 bytes of encapsulation is a safe bet; I thus went for 1400 bytes.


Setting up the tunnel portal on site 2

Site 2 is a vanilla Ubuntu Server running in dom0. IPv6 is offered native, on the external eth0 interface. As such, configuring the interface can be done in /etc/networks/interfaces:

# ULA tunnel to Site 1.
auto mytun
iface mytun inet6 static
  address fd12:3456:789a:3::2
  netmask 64
  mtu 1400
  pre-up ip -6 tunnel add mytun mode ip6ip6 remote 2111:1111:1111:1111::1 local 2222:2222:2222:2222::2 dev eth0
  post-up ip -6 route add fd12:3456:789a:1::1/64 dev mytun mtu 1300
  pre-down ip -6 route del fd12:3456:789a:1::1/64 dev mytun mtu 1300
  post-down ip -6 tunnel del mytun mode ip6ip6 remote 2111:1111:1111:1111::1 local 2222:2222:2222:2222::2 dev eth0

This configures the tunnel.

Testing the tunnel

At this point, one should ensure that the tunnel itself works, logging onto site 1's router, and issuing ping6 fd12:3456:789a:3::2, and logging onto site 2's router and issuing ping6 fd12:3456:789a:3::1.

If that works, try pinging across the tunnel: first ping a machine on site1's backend network from the router on site 2, and a machine on site 2's backend network from a the router on site 1. Finally, pinging from a machine on site 1's network directly to a machine on site 2's network should work, and vise versa!

Securing the tunnel

In this example, I will use IPSec to secure the tunnel. Whereas in IPv4, IPSec requires opening up some UDP ports on both routers, in IPv6 it is built right into the protocol itself. IPSec can operate in three modes:
  1. AH, for direct host-to-host communication. This is hardly used in practice.
  2. ESP, for network-to-network communication, where all hosts on both networks need to cooperate in the IPSec setup.
  3. ESP Tunnel, where network-to-network communication is encrypted on the tunnel only, without the machines on either network needing to know about it.
The easiest setup for my situation is ESP Tunnel: this way, I need to configure IPSec on the routers only, and the whole secured tunnel is transparent to all backend machines.

To this end, I install the ipsec-tools package on both routers; this package is available for both Ubuntu Server and OpenWRT. I will use a simple pre-shared key infrastructure to keep the whole setup as simple as possible.

For a unidirectional ruleset, IPSec needs two keys: an encryption key, and an authentication key. As communication in both directions is treated separately in IPSec, we also need two keys for the other direction. We thus need four keys. In this example, I will use the keys from this howto; do not use these, but generate your own random keys, just as I did!

On site1's router, create a file /etc/ipsec-tools.conf with the following content (replacing the keys with your own), and permissions 700.

#!/usr/sbin/setkey -f

## Flush the SAD and SPD
flush;
spdflush;

# Just simple static keys.
# ESP SAs using 192 bit long keys (168 + 24 parity)
add fd12:3456:789a:3::2 fd12:3456:789a:3::1 esp 0x201 -m tunnel -E aes-cbc 0x7aeaca3f87d060a12f4a4487d5a5c3355920fae69a96c831 -A hmac-md5 0xc0291ff014dccdd03874d9e8e4cdf3e6;
add fd12:3456:789a:3::1 fd12:3456:789a:3::2 esp 0x301 -m tunnel -E aes-cbc 0xf6ddb555acfd9d77b03ea3843f2653255afe8eb5573965df -A hmac-md5 0x96358c90783bbfa3d7b196ceabe0536b;

# Require encryption in between the networks over this tunnel.
spdadd fd12:3456:789a:2::/64 fd12:3456:789a:1::/64 any -P in ipsec
   esp/tunnel/fd12:3456:789a:3::2-fd12:3456:789a:3::1/require;
spdadd fd12:3456:789a:1::/64 fd12:3456:789a:2::/64 any -P out ipsec
  esp/tunnel/fd12:3456:789a:3::2-fd12:3456:789a:3::1/require;

On site 2's router, create the same file, with only one tiny difference: swap in and out on the last two lines, as denoted in red below:

#!/usr/sbin/setkey -f

## Flush the SAD and SPD
flush;
spdflush;

# Just simple static keys.
# ESP SAs using 192 bit long keys (168 + 24 parity)
add fd12:3456:789a:3::2 fd12:3456:789a:3::1 esp 0x201 -m tunnel -E aes-cbc 0x7aeaca3f87d060a12f4a4487d5a5c3355920fae69a96c831 -A hmac-md5 0xc0291ff014dccdd03874d9e8e4cdf3e6;
add fd12:3456:789a:3::1 fd12:3456:789a:3::2 esp 0x301 -m tunnel -E aes-cbc 0xf6ddb555acfd9d77b03ea3843f2653255afe8eb5573965df -A hmac-md5 0x96358c90783bbfa3d7b196ceabe0536b;

# Require encryption in between the networks over this tunnel.
spdadd fd12:3456:789a:2::/64 fd12:3456:789a:1::/64 any -P out ipsec
   esp/tunnel/fd12:3456:789a:3::2-fd12:3456:789a:3::1/require;
spdadd fd12:3456:789a:1::/64 fd12:3456:789a:2::/64 any -P in ipsec
  esp/tunnel/fd12:3456:789a:3::2-fd12:3456:789a:3::1/require;

The add statements set up the keys and the IPSec type (ESP Tunnel), whereas the spdadd statements require the use of these encryption methods in both directions. The only difference between the two sites is which direction is "in", and which direction is "out".

On Ubuntu server, the init/upstart scripts will automatically use the information from the above file on startup. If you want to enable it now, without rebooting, simply run /etc/ipsec-tools.conf as root.

On OpenWRT, we need to add this script on the System -> Startup page. Simply add the command:

/etc/ipsec-tools.conf

And that is it; run it as root if you want to activate it now without rebooting.

Testing the secured tunnel

Initial tests can be done using the same methodology as before: ping router<->router, router<->backend, backend<->router, and backend<->backend. If that all works, we should ensure that the communication is indeed encrypted. 

To this end, I log on to router 2, and start listening for what the external interface sees when I communicate:

tcpdump -n -i eth0 src 2111:1111:1111:1111::1 or dst 2111:1111:1111:1111::1

Then, from a machine on site 1's backend network, I ping a machine on site 2's backend network. Ensure that you use the ULA address, since otherwise the traffic goes over the public net rather than through the tunnel! If all is well, not only do the pings work, but the tcpdump command will show something like:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
20:06:30.469386 IP6 2111:1111:1111:1111::1 > 2222:2222:2222:2222::2: IP6 fd12:3456:789a:3::1 > fd12:3456:789a:3::2: ESP(spi=0x00000301,seq=0xa7), length 88
20:06:30.469852 IP6 2222:2222:2222:2222::2 > 2111:1111:1111:1111::1: IP6 fd12:3456:789a:3::2 > fd12:3456:789a:3::1: ESP(spi=0x00000301,seq=0x917a), length 88

The first message is the ping request, and the second is the reply. Let's take a closer look at what we see here:
  • We can see the external addresses (logically, since otherwise no communication would be possible) of the routers only.
  • We can see the ULA addresses of the routers only.
  • We can see that we are transporting an 88-byte ESP-encrypted payload.
Let's also mention what we do not see:
  • We do not see what addresses on both backend networks are communicating with each other.
  • We do not see what is being communicated.
And there you have it! Enjoy your secure networking.


Tuesday, October 16, 2012

Hot-replacing a failing disk that is a part of Linux Software RAID and ZFS pools

Disks break: not "if", "when".

Yes, that's what they do. I run a 4-disk setup that hold one Linux Software RAID6 array, and two ZFS RAIDZ2 pools. 

Clouds in the sky

As of a few days ago, one of the disks started to fail, which was apparent by the syslog entries like these:

[1318523.293294] ata2.00: failed command: READ FPDMA QUEUED
[1318523.304015] ata2.00: cmd 60/01:00:8f:da:14/00:00:4d:00:00/40 tag 0 ncq 512 in
[1318523.304021]          res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[1318523.346321] ata2.00: status: { DRDY ERR }
[1318523.356810] ata2.00: error: { UNC }
[1318523.367279] ata2.00: failed command: READ FPDMA QUEUED
[1318523.377664] ata2.00: cmd 60/3f:08:60:ad:14/00:00:4d:00:00/40 tag 1 ncq 32256 in
[1318523.377670]          res 41/40:00:98:ad:14/00:00:4d:00:00/40 Emask 0x409 (media error)
[1318523.419883] ata2.00: status: { DRDY ERR }
[1318523.430424] ata2.00: error: { UNC }
[1318523.440904] ata2.00: failed command: READ FPDMA QUEUED
[1318523.451164] ata2.00: cmd 60/01:10:95:29:00/00:00:4e:00:00/40 tag 2 ncq 512 in
[1318523.451169]          res 41/40:00:00:00:00/00:00:00:00:00/00 Emask 0x9 (media error)
[1318523.492656] ata2.00: status: { DRDY ERR }
[1318523.503246] ata2.00: error: { UNC }

As I did not have a spare disk on hand (tsk, tsk, tsk, yes, I know...) I immediately ordered one, even before sending the old disk for RMA. Initially, as I ran a zpool scrub on the pools, there would be only these messages, but the zpool itself did not notice trouble. 

Thunderstorms in the sky

As of yesterday, errors started making it to the zpool layer:

$ sudo zpool status
  pool: data
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
 scan: scrub repaired 356K in 3h25m with 0 errors on Sun Oct  7 15:16:16 2012
config:

NAME                                                 STATE     READ WRITE CKSUM
data                                                 ONLINE       0     0     0
 raidz2-0                                           ONLINE       0     0     0
   ata-WDC_WD2002FYPS-[serial]-part1  ONLINE       0     0  422K
   ata-WDC_WD2002FYPS-[serial]-part1  ONLINE       0     0     0
   ata-WDC_WD2002FYPS-[serial]-part1  ONLINE       0     0     0
   ata-WDC_WD2002FYPS-[serial]-part1  ONLINE       0     0     0

errors: No known data errors

  pool: ttank
 state: ONLINE
 scan: scrub repaired 0 in 0h56m with 0 errors on Fri Oct  5 11:53:31 2012
config:

NAME                                                 STATE     READ WRITE CKSUM
ttank                                                ONLINE       0     0     0
 raidz2-0                                           ONLINE       0     0     0
   ata-WDC_WD2002FYPS-[serial]-part3  ONLINE       0     0     0
   ata-WDC_WD2002FYPS-[serial]-part3  ONLINE       0     0     0
   ata-WDC_WD2002FYPS-[serial]-part3  ONLINE       0     0     0
   ata-WDC_WD2002FYPS-[serial]-part3  ONLINE       0     0     0

By now, the drive not only had read errors; it even started to return faulty data (despite claiming that said data is ok). Fortunately, ZFS is built from the ground up to never trust hardware, so that its checksumming mechanism detected the faulty data. Clearly, it was now time to replace that disk. Fortunately, the spare drive just came in by mail.

Taking the old disk offline

I run my disks in an IcyBox Hotplug backplane, so I wish to replace the disk without even so much as rebooting the server. One first needs to know which disk this is, of course. Since I use the disk-ID links, just looking at the symlinks in /dev/disk/by-id tells me that the disk in question is /dev/sdb.
To be safe, I read a gigabyte of data off the disk, to physically inspect which drive light switches on as I do so:

# dd if=/dev/sdb of=/dev/null bs=1048576 count=1024

Visual inspection tells me that this is the top drive in the IcyBox. Good.

As for ZFS, there is nothing special that one needs to do. For Linux Software RAID, one needs to tell the system to fail, and subsequently remove the disk from the array:

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid6 sde2[5] sdb2[0] sdc2[4] sdd2[2]
      409996800 blocks super 1.2 level 6, 256k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

unused devices:

Fail the disk:

# mdadm /dev/md0 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md0

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid6 sde2[5] sdb2[0](F) sdc2[4] sdd2[2]
      409996800 blocks super 1.2 level 6, 256k chunk, algorithm 2 [4/3] [_UUU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

unused devices:

Remove the disk:

# mdadm /dev/md0 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md0

# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid6 sde2[5] sdc2[4] sdd2[2]
      409996800 blocks super 1.2 level 6, 256k chunk, algorithm 2 [4/3] [_UUU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

unused devices:

At this point, one could yank the disk out, but it's better to tell Linux that you are going to do so. Switching off the disk and detaching it from the system is done as follows:

# echo 1 > /sys/block/sdb/device/delete

The syslog will tell you that the device indeed went offline:

[1734127.293861] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[1734127.331629] sd 1:0:0:0: [sdb] Stopping disk
[1734127.768141] ata2.00: disabled

As this point, the tray can be taken from the Hotplug backplane, and the old disk can be replaced by the new one.

Bringing the new disk online

After physically taking out the tray, removing the old disk from the tray, and adding the new disk to the tray, I replaced the tray. The kernel detects the disk:


[1743181.511929] ata2: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[1743181.512460] ata2: irq_stat 0x00000040, connection status changed
[1743181.512883] ata2: SError: { CommWake DevExch }
[1743181.513215] ata2: hard resetting link
[1743187.276049] ata2: link is slow to respond, please be patient (ready=0)
[1743190.860073] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[1743190.998197] ata2.00: ATA-9: WDC WD20EFRX-[serial], max UDMA/133
[1743190.998206] ata2.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[1743190.998836] ata2.00: configured for UDMA/133
[1743190.998855] ata2: EH complete
[1743190.999097] scsi 1:0:0:0: Direct-Access     ATA      WDC WD20EFRX-[serial]
[1743190.999679] sd 1:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[1743190.999691] sd 1:0:0:0: [sdf] 4096-byte physical blocks
[1743190.999705] sd 1:0:0:0: Attached scsi generic sg1 type 0
[1743191.000185] sd 1:0:0:0: [sdf] Write Protect is off
[1743191.000197] sd 1:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[1743191.000328] sd 1:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[1743191.014153]  sdf: unknown partition table
[1743191.014902] sd 1:0:0:0: [sdf] Attached SCSI disk
[1743415.817135]  sdf: unknown partition table

Obviously, there are no partitions on the disk yet. In order to create them, I simply copy them off one of the other drives:

# sfdisk -b /dev/sdc | sfdisk /dev/sdf

This is readily picked up by the kernel:

[1743415.817135]  sdf: unknown partition table
[1743416.227972]  sdf: sdf1 sdf2 sdf3

Resilvering the arrays

The first array I decide to resilver is the most important one: the primary data pool:

# zpool replace data /dev/disk-by-id/ata-WDC_WD2002FYPS-[serial]-part1 /dev/disk/by-id/ata-WDC_WD20EFRX-[serial]-part1

This is going to take a long time: more when this is done.




Saturday, August 25, 2012

No-downloading inconveniences in the digital age

My country does not allow me to download music/movies for personal use!

While a good number of countries (e.g., The Netherlands, Switzerland) have relatively sane laws that allow the downloading (though not uploading) of music and movies, there are a good number of other countries where even the downloading of music and movies for personal use is forbidden.

Even if one does live (or operates a server) in one of the latter countries, these restrictions are but small inconveniences that are easily worked around.

Case in point here is an Ubuntu Linux server in one of these countries to which somebody wants to download content from the Giganews Usenet provider, where one sets up OpenVPN himself. Note that although this article is written in terms of Linux and Giganews, the general principles readily carry over to other situations.

The solution: OpenVPN

The solution in this case is to hide the fact that your are perusing the service from the country with the backward laws that you happen to be in. A simple mechanism to do this is to use OpenVPN: ones creates an encrypted VPN tunnel over which one tunnels the connections to Giganews.

If one already has an account at Giganews, Giganews offers a branded deal through VyprVPN where you get OpenVPN access for $5 per month.

Step 1: Apply for OpenVPN access at Giganews

Just follow the steps on their website: you can't go wrong there.

Step 2: Install OpenVPN

sudo apt-get install openvpn

(easy enough)

Step 3: Install the VyprVPN root certificate

sudo wget -O /etc/openvpn/ca.vyprvpn.com.crt http://www.giganews.com/vyprvpn/ca.vyprvpn.com.crt

This allows your OpenVPN client to ascertain that it is indeed talking to VyprVPN, and not to some man-in-the-middle attack box your government may have put in place.

Step 4: Create a configuration for your VyperVPN

The easiest way to do this is to create two files: one that contains your Giganews username and password, and one that contains the OpenVPN client configuration. The names are arbitrary, but I happen to use these:

/etc/openvpn/vyprvpn.pass contains:

gn123456
abcd1234

(replace the red content with your actual username and password).

/etc/openvpn/vyprvpn.conf contains:

client
dev tun
proto udp
remote eu1.vpn.giganews.com 1194
resolv-retry infinite
nobind
persist-key
persist-tun
persist-remote-ip
ca ca.vyprvpn.com.crt
tls-remote eu1.vpn.giganews.com
auth-user-pass vyprvpn.pass
comp-lzo
verb 1

(you could replace the eu1 part with several other options, but eu1 is in the Netherlands, where downloading is legal).

Step 5a: Fire and forget

Open boot, your server will now automatically start up your VyperVPN, and route all traffic through it. You can also force it right now by issuing:

sudo /etc/init.d/openvpn restart

If that is not what you want, e.g., because you use the box for other purposes, too, the next step will describe how to route just your Giganews traffic through the VPN.

Step 5b (optional): Route just Giganews traffic through the VPN.

If this is what you want, this is possible, too. Simply add the green content to your /etc/openvpn/vyprvpn.conf file:

client
route-noexec
route-up /etc/openvpn/vyprvpn-route-up.sh
down /etc/openvpn/vyprvpn-route-down.sh
script-security 2
dev tun
proto udp
remote eu1.vpn.giganews.com 1194
resolv-retry infinite
nobind
persist-key
persist-tun
persist-remote-ip
ca ca.vyprvpn.com.crt
tls-remote eu1.vpn.giganews.com
auth-user-pass vyprvpn.pass
comp-lzo
verb 1

The route-noexec option tells OpenVPN to not directly use all route pushes it gets from the VyprVPN server, but to pass options via environment variables to scripts in which you are in control of what happens.

In my case, I wanted to use news-europe.giganews.com for downloading. I used whois to figure out that their IP range in Europe is 216.196.96.0/19. The two scripts mentioned above now contain:

/etc/openvpn/vyprvpn-route-up.sh:

#!/bin/bash

# Route Giganews Europe (216.196.96.0/19), and ONLY Giganews,
# through VyprVPN.
ip route add 216.196.96.0/19 dev $dev

(note that $dev is passed in the environment by OpenVPN).

/etc/openvpn/vyprvpn-route-down.sh:

#!/bin/bash

# Remove routing for Giganews Europe (216.196.96.0/19).
ip route del 216.196.96.0/19

Step 6: Check that things work

Quickly check that your routing to Giganews indeed goes through the VPN:

traceroute news-europe.giganews.com

traceroute to news-europe.giganews.com (216.196.109.144), 30 hops max, 60 byte packets
 1  10.25.0.1 (10.25.0.1)  14.601 ms  14.606 ms  14.611 ms
 2  * * *
 3  vl304.gw1.ams.giganews.com (216.196.108.218)  15.268 ms  15.309 ms  15.274 ms
 4  news-europe.giganews.com (216.196.109.144)  14.964 ms  15.195 ms  15.210 ms

Here, the first hop being on a private subnet (10.25.0.1, on 10.0.0.0/8, which is private) tells you that traffic is routed correctly.

Happy networking!

Wednesday, February 22, 2012

Local-disk encryption to protect against casual privacy loss

Like many others, I store a lot of privacy-sensitive information on the disks of my local server: photos, scanned documents, and more. I do not feel the need to protect that data from those who have physical access to the machine, let alone to protect that data from authorities, should those ever come along with a (mistaken) warrant. No, the protection I seek is much simpler:

The protection I would like is against those who get one of my disks, for example when I exchange a disk under warranty. It would not be the first time that such a disk is resold, or that the friendly shop personnel scan the disk for interesting data. Also, my other server, which sits in a remote datacenter, should not leak information when a disk is exchanged.

The simple mechanism by which I now do this is by accessing the underlying disks (or partitions) of my data disks through dm_crypt , and to create zpools, mdraid, or simple filesystems on top of those dm_crypt mapped block devices. The normal way to do this is to add the required entries to /etc/crypttab, but I find that Ubuntu sets up these devices too late in the game. Therefore, I created my own script.

On my remote server, I have a script in /etc/init.d/local-cryptsetup , which contains:

#!/bin/bash
/sbin/cryptsetup -d /etc/mydevs/passwd.dat create zloop0 /dev/disk/by-id/[NAME_DISK1]
/sbin/cryptsetup -d /etc/mydevs/passwd.dat create zloop1 /dev/disk/by-id/[NAME_DISK2]

In /etc/rc2.d, /etc/rc3.d, /etc/rc4.d, and /etc/rc5.d, I symlink a link called S05local-cryptsetup to the above script. I chose the number S05, as I use these mappings are underlying devices for a ZFS ZPool, and the ZFS subsystem is started at S20. As S05 < S20, this ensures that the mappings are available before ZFS attempts to start using them.

Initializing the ZPool once was easy enough:

# zpool create tank mirror /dev/mapper/zloop0 /dev/mapper/zloop1

I ensures that the pool, and all data in it, successfully survive a reboot.