Exporting Cassandra 2.2 to Google BigQuery

So we decide to move 5 years of data from Apache Cassandra to Google BigQuery. The problem was not just transferring the data or export/import, the issue was the very old Cassandra!

After extensive research, we have planned the migration to export data to csv and then upload in Google Cloud Storage for importing in Big Query.

The pain was the way Cassandra 1.1 deal with large number of records! There is no pagination so at some point your gonna run out of something! If not mistaken, pagination is introduced since version 2.2.

After all my attempts to upgrade to latest version 3.4 failed I decide to try other versions and luckily the version 2.2 worked! By working I mean I were able to follow the upgrading steps to end and the data were accessible.

Because I could not get any support for direct upgrade and my attempts to simply upgrade to 2.2 also failed. So I had no choice but to upgrade to 2.0 and then upgrade it to 2.2. Because this is extremely delicate task I rather just forward you to official website and only then give you the summary. Please make sure you check docs.datastax.com and follow their instructions.

To give an overview, you are going to do these steps:

  1. Making sure all nodes are stable and there is no dead nodes.
  2. Make backup (your SSTables, configurations and etc)
  3. It is very important to successfully upgrade your SSTable before proceeding to next step. Simply use
    nodetool upgradesstables
  4. Drain the nodes using
    nodetool drain
  5. Then simply stop the node
  6. Install the new version (I will explain fresh installation later in this document)
  7. Simply do the config as your old Cassandra, start it and upgradesstables again (as in step 3) for each node.

Installing Cassandra:

  1. Edit /etc/yum.repos.d/datastax.repo
  2. [datastax]
    name = DataStax Repo for Apache Cassandra
    baseurl = https://rpm.datastax.com/community
    enabled = 1
    gpgcheck = 0
    
  3. And then install and start the service:
  4. yum install dsc20
    service cassandra start
    

Once you are upgrade to Cassandra 2+ you can export the data to csv without having pagination or crashing issue.

Just for the records, a few commands to get the necessary information about the data structure is as follow:

cqlsh -u username -p password
describe tables;
describe table abcd;
describe schema;

And once we know the tables we want to export we just use them alongside its keyspace. First add all your commands in one file to create a batch.

vi commands.list

For example a sample command to export one table:

COPY keyspace.tablename TO '/backup/export.csv';

And finally run the commands from the file:

cqlsh -u username -p password -f /backup/commands.list

So by now, you have exported the tables to csv file(s). All you need to do now is uploading the files to Google Cloud Storage:

gsutil rsync /backup gs://bucket

Later on you can use Google API to import the csv files to Google BigQuery. You may check out the Google documentations for this in cloud.google.com

Advertisements

Linux restricted administration

One of the challenges when adding a user in Linux environment is when you need to precisely define what they can or can not do. Some may find configuring authorisation in Linux a bit complicated. In my case I needed to add a user with privilege to execute some commands with sudo but without full root access.

The first thing is to create the user we want to customise:
adduser jack

and then create the key and later pass the id_rsa to him:

ssh-keygen -t dsa
mkdir jack/.ssh
chmod 700 jack/.ssh/
cp id_rsa.pub jack/.ssh/authorized_keys
chmod 600 jack/.ssh/authorized_keys
chown -R jack:jack jack/.ssh

Then you should edit the sudoers:

visudo -f /etc/sudoers.d/developers-configs
User_Alias DEVELOPERS = jack,john

Cmnd_Alias SEELOGS = /usr/bin/tail /var/log/nginx/*.error, /usr/bin/tail /var/log/nginx/*.access, /bin/grep * /var/log/nginx/*.error, /bin/grep * /var/log/nginx/*.access

Cmnd_Alias EDITCONFIGS = /bin/vi /etc/nginx/site.d/*.conf, /usr/bin/nano /etc/nginx/site.d/*.conf, /bin/cat /etc/nginx/site.d/*.conf

Cmnd_Alias RESTARTNGINX = /sbin/service nginx status, /sbin/service php-fpm status, /sbin/service nginx restart, /sbin/service nginx configtest, /sbin/service php-fpm restart

DEVELOPERS ALL = NOPASSWD: SEELOGS,EDITCONFIGS,RESTARTNGINX

We just create DEVELOPERS as alias for users and SEELOGS, EDITCONFIGS, RESTARTNGINX as alias of commands the user can excute; and then assigned SEELOGS, EDITCONFIGS and RESTARTNGINX privilages to DEVELOPERS. If you want users to be prompted for password you can remove the “NOPASSWD:” part.

Please note that depends on your OS you may need to add the user in “/etc/ssh/sshd_config” … for example “AllowUsers jack”.

Free secure backup in MEGA

http://mega.nz/ is a secured cloud storage that gives away up to 50GB free space. Using this service is recommended due to very tight security as even the user will not be able to gain access to data if he lose the password (and lose the recovery key).

 

Mega provided some scripts for uploading, syncing and etc to their cloud. This is specially useful when it comes to cheap secure backup of your files. All you need to do is creating a free account for beginning and perhaps purchase a premium account for a better service.

First it is just appropriate to setup proper locale variables:

localedef -i en_US -f UTF-8 en_US.UTF-8

And then setup the dependencies (Amazon Linux):

yum groupinstall 'Development Tools'
yum install glib* openssl-devel libcurl-devel libproxy gnutls
rpm -i ftp://rpmfind.net/linux/centos/6.7/os/x86_64/Packages/glib-networking-2.28.6.1-2.2.el6.x86_64.rpm
yum update kernel
yum update

OK! So now we have all the required libraries in place, we can proceed to megatools installation:

wget https://megatools.megous.com//builds/megatools-1.9.95.tar.gz
tar xvf megatools-1.9.95.tar.gz 
cd megatools-1.9.95
./configure
make

In case you don’t want to pass the account username password with each command we can just save it PLAINTEXT in a file (it is safer from some aspects but it has the risk of unauthorised access to this file)

vi /root/.megarc:
[Login]
Username = your-email
Password = your-password

Just to test, run mega disk free script which gives you some info about space you have on the cloud:
./megadf

You can find some more commands details @https://megatools.megous.com/man/megatools.html

And at the end some examples:
./megaput file_to_upload
./megaget /Root/file_to_download

Monitoring SMTP server (using mailgraph, qshape and postqueue

Monitoring SMTP mails never been easier! You can check the number of sent emails, bounced emails, rejected emails and etc (you can see demo at http://www.stat.ee.ethz.ch/mailgraph.cgi).

Let’s get the source:

wget http://mailgraph.schweikert.ch/pub/mailgraph-1.14.tar.gz
tar xf mailgraph-1.14.tar.gz
mv mailgraph-1.14 /var/mailgraph

and install dependencies:

yum install perl-rrdtool
yum install perl-File-Tail

and run the mailgraph once to fetch the previous logs:

./mailgraph.pl -l /var/log/maillog -c -v

then run it as service:

cp mailgraph.pl /usr/local/bin/
./mailgraph-init start

Now the cgi file is needed to be executed by web server so once we installed the web server we will configure CGI configs. The main changes are AddHandler, Add ExecCGI to “/var/mailgraph” directory and add mailgraph.cgi as an index file like index.php.

yum install httpd
cp /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.original
vi /etc/httpd/conf/httpd.conf (appendix 01)
service http start
chkconfig httpd on

Now you should be able to browse the web server and see the stats. I have to note that the stats are not getting updated frequently. I coolant figure out how frequent this is getting updated but sometimes I had to run the mailgraph script manually to get latest stats. At this moment we are done with mailgraph.

Now we just take a look at shape and other CLI tools for monitoring SMTP. qshape and post queue are two useful scripts found in postfix additional scripts. We need to install it first:

yum install postfix-perl-scripts

And here are some example of the tool to get the stat of different queues:

qshape hold
qshape deferred
qshape active

postqueue is another tool:

postqueue -p
postqueue -p | egrep -c "^[0-9A-F]{10}[*]"
postqueue -p | egrep -c "^[0-9A-F]{10}[^*]"

we also can use the maillot directly with assistance of grep:

grep -c "postfix/smtp.*status=sent" /var/log/maillog 
grep -c "postfix/smtp.*status=bounced" /var/log/maillog

Continue reading

Deploying SMTP server using Postfix and OpenDKIM

Days ago we got some issue with Amazon SES and decide to make our own SMTP relay service. I tried the combinations of sendmail with dim-milter but for some reason I could not make it work so I start another server from scratch which did work this time. This post mainly focus on configuring OpenDKIM as Postfix is fairly straightforward.

The first thing to do is disabling sendmail which is the default SMTP client on Amazon Linux:

service sendmail stop
chkconfig sendmail off

And then installing Postfix and configuring it:

yum install postfix
cp /etc/postfix/main.cf /etc/postfix/main.cf.original 
vi /etc/postfix/main.cf (appendix 01)

Configuration mainly involves adding DKIM settings (such as milter socket info) and modifying receptions restrictions. In the new setting we define sender_access file to contain the list of senders who can relay through our SMTP service.

cat "DOMAIN.COM OK" >> /etc/postfix/sender_access
postmap /etc/postfix/sender_access

Now we are done with Postfix so it is better to check by sending a test mail.

service postfix start
chkconfig postfix on

The main concern of this post, DKIM, is to cryptographically validate the sender is really from that domain (i.e. domain.com). The validation process starts with the SMTP server signing the email using its private key and then the destination mail server tries to match the private key with the public key obtained from senders’ claimed domain DNS. Once the private and public key matched then it means the SMTP server is from the domain it claims to be.

To install the DKIM we need some API from sendmail and openssl:

yum install sendmail-devel openssl-devel

openDKIM is not available in default repository so we will add it and then install it:

rpm -Uvh http://mirror.pnl.gov/epel/6/i386/epel-release-6-8.noarch.rpm
yum --disablerepo=* --enablerepo=epel install opendkim

Once installation finished we proceed to configurations. There are 3 main files to configure: opendkim.conf contains the main configs such as address of signing table file, key table file and the socket address to listen. The key table file contains the list of keys. The signing table defines which domains should be signed by which key. You will also need to add trusted IP addresses of senders in /etc/opendkim/TrustedHosts to grant them access to SMTP.

cp /etc/opendkim.conf /etc/opendkim.conf.original
vi /etc/opendkim.conf (appendix 02)
vi /etc/opendkim/KeyTable  (appendix 03)
vi /etc/opendkim/SigningTable (appendix 04)

Now we need to generate a pair of keys (private and public) which the public key will be added into DNS records of send domain:

mkdir /etc/opendkim/keys/DOMAIN.COM
opendkim-genkey -D /etc/opendkim/keys/DOMAIN.COM/ -d DOMAIN.COM -s default
mv /etc/opendkim/keys/DOMAIN.COM/default.private /etc/opendkim/keys/DOMAIN.COM/default
chown -R opendkim:opendkim /etc/opendkim/keys/DOMAIN.COM
cat /etc/opendkim/keys/DOMAIN.COM/default.txt

Then start the service:

service opendkim start
chkconfig opendkim on

Finally, you have to add a TXT record in your DNS dashboard. The record name should be default._domainkey.DOMAIN.COM and it should contains something like the following (based on the /etc/opendkim/keys/DOMAIN.COM/default.txt):
“v=DKIM1; g=*; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDHY7Zl+n3SUldTYRUEU1BErHkKN0Ya52gazp1R7FA7vN5RddPxW/sO9JVRLiWg6iAE4hxBp42YKfxOwEnxPADbBuiELKZ2ddxo2aDFAb9U/lp47k45u5i2T1AlEBeurUbdKh7Nypq4lLMXC2FHhezK33BuYR+3L7jxVj7FATylhwIDAQAB”

/etc/postfix/main.cf (appendix 01)

smtpd_milters           = inet:127.0.0.1:2525
non_smtpd_milters       = $smtpd_milters
milter_default_action   = accept

smtpd_error_sleep_time = 1s
smtpd_soft_error_limit = 10
smtpd_hard_error_limit = 20

smtpd_recipient_restrictions = 
	 permit_mynetworks
	 check_sender_access hash:/etc/postfix/sender_access
	 reject_unauth_destination

queue_directory = /var/spool/postfix
command_directory = /usr/sbin
daemon_directory = /usr/libexec/postfix
data_directory = /var/lib/postfix
mail_owner = postfix

inet_interfaces = all
inet_protocols = all

mydestination = $myhostname, localhost.$mydomain, localhost
unknown_local_recipient_reject_code = 550

alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases

debug_peer_level = 2
debugger_command =
	 PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
	 ddd $daemon_directory/$process_name $process_id & sleep 5

sendmail_path = /usr/sbin/sendmail.postfix
newaliases_path = /usr/bin/newaliases.postfix
mailq_path = /usr/bin/mailq.postfix
setgid_group = postdrop
html_directory = no

manpage_directory = /usr/share/man
sample_directory = /usr/share/doc/postfix-2.6.6/samples
readme_directory = /usr/share/doc/postfix-2.6.6/README_FILES

/etc/opendkim.conf (appendix 02)

Canonicalization        relaxed/relaxed
ExternalIgnoreList      refile:/etc/opendkim/TrustedHosts
InternalHosts           refile:/etc/opendkim/TrustedHosts
KeyTable                refile:/etc/opendkim/KeyTable
LogWhy                  Yes
MinimumKeyBits          1024
Mode                    sv
PidFile                 /var/run/opendkim/opendkim.pid
SigningTable            refile:/etc/opendkim/SigningTable
Socket                  inet:2525@127.0.0.1
Syslog                  Yes
SyslogSuccess           Yes
TemporaryDirectory      /var/tmp
UMask                   022
UserID                  opendkim:opendkim

/etc/opendkim/KeyTable (appendix 03)

default._domainkey.DOMAIN.COM:default:/etc/opendkim/keys/DOMAIN.COM/default

/etc/opendkim/SigningTable (appendix 04)

*@DOMAIN.COM default._domainkey.DOMAIN.COM

The Linux Cheatsheet

FIX CHANGE LOCALE ERROR:
localedef -i en_US -f UTF-8 en_US.UTF-8
vi /etc/profile.d/locale.sh
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
export LC_COLLATE=C
export LC_CTYPE=en_US.UTF-8
source /etc/profile.d/locale.sh

MANUAL LOG ROTATION
touch maillog2; chmod `stat -c %a maillog` maillog2; chown `stat -c %u maillog` maillog2;
mv maillog maillog-$(date +%Y%m%d); mv maillog2 maillog; service rsyslog restart;
screen -d -m bzip2 maillog-$(date +%Y%m%d);

CHANGE HOSTNAME (CENTOS7):
hostnamectl set-hostname voyager123

REMOVE YUM REPO:
yum-config-manager –disable repository-id-or-name

USING ANTI VIRUS SOLUTION:
yum –enablerepo=epel install clamav
freshclam
clamscan –infected –remove –recursive /home

TO MOVE ALL OLD FILES INTO A DIRECTORY:
mv `ls -1tr | head -n 10` archive/

USING CLI and CURL TO GET PUBLIC IP:
curl http://ip4.me 2>/dev/null | sed -e ‘s#]*>##g’ | grep ‘^[0-9]’

USING CLI AND CURL TO GET EC2 INFO:
curl http://169.254.169.254/latest/meta-data/

FIND TOP 10 LARGEST DIRECTORIES:
find . -type d -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}

MONITORING TOP 5 PROCESS:
ps -eo pid,pcpu,args –no-headers | sort -nk 2 | head -n 5
ps -Ao pid,pcpu,args –sort=-pcpu | head -n 5
top -b -n 1 | head -n 12 | tail -n 5

MONITOR SPECEFIC PROCESS:
top -b -n 10 | grep something

FIND WHICH PACKAGE PROVIDE A FILE OR FEATURE:
yum provides ‘*File/Tail.pm’

PREPARE FTP AND USERS:
yum install vsftpd
vi /etc/vsftpd/vsftpd.conf
local_enable=YES
anonymous_enable=NO
service vsftpd restart
groupadd ftpusers
adduser -g ftpusers -d /var/www/html/ws1 -s /sbin/nologin -c commentsgoeshere -p password ftp.user1
chown -R apache:ftp.user1 /var/www/html/ws1
chmod 775 -R /var/www/html/ws1

INSTALL AND ENABLE EPEL REPOSITORY:
rpm -Uvh http://epel.mirror.net.in/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum –disablerepo=* –enablerepo=epel list available

INSTALL RPMFORGE REPOSITORY:
rpm -Uvh http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.i686.rpm

INSTALL RAR AND 7ZIP:
yum install rar unrar p7zip -y

CHECK A WEBSITE REDIRECTIONS AND HEADERS ONLY:
curl -L -s -D – somedomain.com -o /dev/null

DOWNLOAD FROM FTP USING CURL:
curl -u ‘username’:’password’ ‘ftp://ftp-server/file.zip’ -o local_file.zip

FINDS ALL FILES CONTAINING A PATTERN:
grep -rnw ‘directory’ -e “pattern”

DOWNLOAD ALL CONTENTS OF THE FTP USING WGET
wget -r -l 0 –user=”username” –ask-password ftp://domain.com/public_html/

COPY FILE FROM LOCAL TO SERVER PORT 2222:
scp -p 2222 -i /home/user/key.pem /home/user/backup.sql username@server:/opt/mysql/backup.sql

COPY FILE FROM SERVER PORT 2222 TO LOCAL:
scp -p 2222 -i /home/user/key.pem username@server:/opt/mysql/backup.sql /home/user/backup.sql

SYNC FILES FROM LOCAL TO SERVER IN PORT 2222:
rsync -azPvrltgo -e ‘ssh -p 2222’ /var/mail/ root@server:/var/mail/

SYNC FILES FROM SERVER TO LOCAL IN PORT 2222:
rsync -azPvrltgo -e ‘ssh -p 2222′ root@server:/var/mail/ /var/mail/

RESET MYSQL ROOT PASSWORD:
sudo service mysqld stop
sudo mysqld_safe –skip-grant-tables &
mysql -u root
use mysql;
update user set password=PASSWORD(“newpassword”) where User=’root’;
flush privileges; //not sure the effects
quit
sudo service mysqld restart

REPLACE STRING IN ALL FILES WITH SPECEFIC PATTERN (IN THE CURRENT DIR):
find . -type f -name “*.*” -exec sed -i ‘s/old-text/new-test/g’ {} +
find . -type f -name “db-*-data-config.xml” -exec sed -i ‘s/old-text/new-text/g’ {} +
find . -type f -name “solrconfig.xml” -exec sed -i ‘s/old-text/new-text/g’ {} +

SHOW CONTENTS OF ALL THE FILES THAT FOUND:
find . -type f -name “db*data-config.xml” -exec cat {} \;

MAKE BACKUP OF SPECIFIC FOLDERS:
find . -type d -name “conf” -exec cp –parents -a {} /root/conf_backup/ \;

Continue reading

Join Amazon Linux to Domain Controller

I recently had a case where Linux machines were joined to a domain alongside the windows machines. Domain users and groups were accessible and configurable in Linux machines using Samaba. The good thing about domain configuration was that it simple was possible to assign domain accounts to users and they use it to login in both windows and linux machines.

There is no need to manage users on each linux machine which is a great deal for enterprises. Below I walk you through the linux setup and configuration for using Samba, Kerberos and LDAP.

I strongly suggest to backup your data before doing anything. It is a better idea if you can just use a staging machine to test new setup instead of start running commands on your production environment.

The first step is to install Samba; but since Amazon Linux does not include Samba repository we add it and then install it.

wget http://ftp.sernet.de/pub/samba/3.5/centos/5/sernet-samba.repo
mv sernet-samba.repo /etc/yum.repos.d/
yum install samba -y

And then installing Kerberos:

yum install krb5-workstation -y

Obviously the important part is always configuration! just for reference here is the list of config file you may need to change:

  1. /etc/resolve.conf: contains the name servers and you may want to add the domain nameserver too or else you will not able to join it.
  2. /etc/sysconfig/network: contains the network configuations and you may need to change localhost name to a DQDN (fully qualified domain name) like abcd.mydomain.local
  3. /etc/krb5: contains Kerberos network authentication protocol configurations and for sure you will need to update it.
  4. /etc/samba/smb.conf: contains the Samba configurations and necessary for everything to work.
  5. /etc/openldap/ldap.conf: contains LDAP (lightweight directory access protocol) configurations
  6. /etc/hosts: contains local records of hosts’ ip addresses. You might need to change some host’s ip here specially if you changed hostname.

I have copied my configurations as appendices so you can use them as a working reference.

At the end there is a last piece of configuration that needs to be done before trying to join domain. Here we enable some of the authentication configuration:

authconfig --enablekrb5 --enablewinbind --enablemkhomedir --update

Finally we can join to domain by “net ads” command as follow:

net ads join -W MYDOMAIN.LOCAL -U adadminuser -S ad.mydomain.local

Appendix 1: resolve.cong

search ap-southeast-1.compute.internal mydomain.local
nameserver 10.x.x.x
nameserver 10.1.0.2

Appendix 2: kbr5.conf

# make sure server name is capital.

[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = MYDOMAIN.LOCAL
 dns_lookup_realm = true
 dns_lookup_kdc = true
 ticket_lifetime = 24h
 forwardable = yes

[realms]
 MYDOMAIN.LOCAL = {
 }

[domain_realm]
 .mydomain.local = MYDOMAIN.LOCAL
 mydomain.local = MYDOMAIN.LOCAL

[appdefaults]
 pam = {
   debug = false
   ticket_lifetime = 36000
   renew_lifetime = 36000
   forwardable = true
   krb4_convert = false
 }

Appendix 3: smb.conf

[global]
workgroup = MYDOMAIN
realm = MYDOMAIN.LOCAL
security = ads
idmap uid = 16777216-33554431
idmap gid = 16777216-33554431
template homedir = /home/MYDOMAIN/%U
template shell = /bin/bash
winbind use default domain = true
winbind offline logon = false
server string = Samba Server Version %v

log file = /var/log/samba/log.%m
max log size = 50

passdb backend = tdbsam

load printers = yes
cups options = raw

[homes]
comment = Home Directories
browseable = no
writable = yes

[printers]
comment = All Printers
path = /var/spool/samba
browseable = no
guest ok = no
writable = no
printable = yes

Further info:
http://www.server-world.info/en/note?os=CentOS_7&p=samba&f=3
https://wiki.samba.org/index.php/Setup_a_Samba_AD_Member_Server