Authorguda

How s3fs caching

…or one-day just to add one line of code

s3fs.S3FileSystem.cachable = False

Adding caching under the good and not mentioning it in the documentation – that is called a dirty trick.

My case was lambda processing s3 files. When a file comes on s3 lambda process the file and triggers next lambda. The next lambda works fine only the first time.

The first lambda is using only boto3 and there is no problem.

The second lambda use s3fs.

The second invocation of the lambda is using already initialized context and the s3fs thinks that it knows what objects are on s3 but it is wrong!

So…. I found this issue – thank you jalpes196 !

Another way is to invalidate the cache…

from s3fs.core import S3FileSystem


S3FileSystem.clear_instance_cache()
s3 = S3FileSystem(anon=False)
s3.invalidate_cache()

Daily systemd commands

You need to create a unit file.

Both timer and service/units must be enabled if you want to run them.

Reload

Must be done on change unit/timer files in /etc/systemd/system

 systemctl daemon-reload

View the logs

journalctl -u advanced.service 

The status of the unit/service

systemctl status advanced.service
systemctl status advanced.timer

Check the timers

systemctl list-timers --all

Why one should use Firefox in 2020

I have switched from Google Chrome to chromium for security and privacy issues. Now I am switching from Chromium to Firefox because of many issues.

Chromium stopped to ship deb packages and start using Snapd. Snap runs in cgroup (probably) and hides very important folders from the OS

  • /tmp
  • ~/.ssh

Certificates
My access to some payment website was rejected because the certificates are in the ~/.ssh

System tmp
When I download some junk files/attachments I store them in the /tmp folder and on the next system reboot, my /tmp is cleaned. When I can’t access the /tmp from Chrome I have started using ~/tmp/ and have tons of useless files.

Speed
When I switch to firefox I noticed that this browser is much faster than Chrome.

Chromuim after migrating to snapd do not work correctly with dbus

Firefox is faster

No easy way to add a custom search engine.

Sort AWS S3 keys by size

A naive version which will sort the keys in a s3 folder. It would not work if the keys contain spaces.

Here is a usage example

aws s3 ls BUCKETNAME/signals/wifi/  |  ~/bin/aws-s3-sort.rb
#!/usr/bin/ruby
content = ARGF.read

lines = content.split("\n")

key_size = {}
lines.each do |line|
  cells = line.split(' ')
  key_size[cells[2]] = cells[3]
end; ''

sorted = key_size.sort_by { |k, v| k.to_i }.to_h

sorted.each do |key, value|
  puts "#{key} -> #{value}"
end; ''

Extract a huge number of files from AWS s3 glacier

You can first try s3cmd and if it doesn’t work, go for an advanced solution which supports millions of files.

s3cmd restore \
    --recursive s3://bucket.raw.rifiniti.com \
    --restore-days=10

To bulk request files to be extracted from glacier I use this script. I hope that will be useful to you also

#!/bin/bash
#
# Get s3 objects from glacier by prefix
# The prefix is optional!
#
# How to use:
#  ./export-prefix.sh bucketName 30 2019-04-30
#  ./export-prefix.sh bucketName 30
#
#
export bucket=$1

# How many days to keep the objects
export day=$2
export prefix=$3

if [ -z "$prefix" ]
then
  cmd="aws2 s3api list-objects  --bucket $bucket"
else
  cmd="aws2 s3api list-objects  --bucket $bucket --prefix $prefix"
fi

readarray -t KEYS < <($cmd | jq '.Contents[] |  select( .StorageClass != "STANDARD" ) | ."Key"')
for key in "${KEYS[@]}"; do
  echo "aws s3api restore-object --bucket $bucket --key ${key} --restore-request '{\"Days\":$day,\"GlacierJobParameters\":{\"Tier\":\"Standard\"}}'" >> /tmp/commands.sh
done

echo "Generated file /tmp/commands.sh"

echo "Splitting the huge file into small files: /tmp/sub-commands*"
split -l 1000 /tmp/commands.sh /tmp/sub-commands.sh.
chmod a+x /tmp/sub-commands*


The script will generate in /tmp/commands.sh file with all the commands that you need to run.

When you have a lot of files it would be not possible to run the bash script because it would be killed at some point. To avoid this, we have to split the /tmp/commands.sh into parts. This is what the last part of the shell script is doing.

Now use this snippet to run the commands file by file.

for x in `ls /tmp/sub-commands*`; do
  echo "working on $x"
  `$x`
done

Or if you have installed “parallels” you can run much faster with

for x in `ls /tmp/sub-commands*`; do
  echo "working on $x"
  parallel -j 10 < $x
done

Update: Make the script work with keys containing spaces

Update2: Make it work with a lot of files and add parallel example

Dynamodb Table Export and local Import

If you have problems with serverless framework or another cloud formation setup to create the tables because they already exist there is another option:

  • delete the tables (make a backup)
  • deploy
  • delete the tables (no backup)
  • retore the tables from the backup

Note: The restore could last for hours.

If you really want to do dump and restore continue reading…

Directly into the local Dynamodb

To get your data locally from an aws Dynamodb you should spin your local Dynamodb server

docker run -p 8000:8000 amazon/dynamodb-local

Let’s pull and create the schema in the local.


$TABLE=essential-events
aws dynamodb describe-table \
    --table-name $TABLE | \
    jq '.Table | del(.TableId, .TableArn, .ItemCount, .TableSizeBytes, .CreationDateTime, .TableStatus, .ProvisionedThroughput.NumberOfDecreasesToday)' | \
    tee $TABLE-schema.json

aws dynamodb create-table --cli-input-json file://$TABLE-schema.json --endpoint-url http://localhost:8000

aws dynamodb list-tables --endpoint-url http://localhost:8000

Now as we have the local Dynamodb we can read from the AWS and import into the local.

The tricky part is that the cloud is hostile, you can only import max 25 items and no more than 16mb and so on…

So instead of having a nice export/import one-liners, we have to write a short script.

This script will read 25 records and import them to the local Dynamodb.

TABLE=essential-events
maxItems=25
index=0
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1)) 
echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000


nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
  DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
  ((index+=1))
  echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
  aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
  nextToken=$(echo $DATA | jq '.NextToken')
done

By saving it to files

As this probably will be the development table, we would like to do multiple imports from the exported data. Here is a version of the upper script which stores the data in files so that you can re-import later.

TABLE=essential-events
maxItems=25
index=0
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1))
echo $DATA | cat > "$TABLE-$index.json"

nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
  DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
  ((index+=1))
  echo $DATA | cat > "$TABLE-$index.json"
  nextToken=$(echo $DATA | jq '.NextToken')
done

Now we have a bunch of table-NNN.json files. When I want to do the import I do

for x in `ls *$TABLE*.json`; do
  cat $x | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
  aws dynamodb batch-write-item --request-items file://inserts.jsons
done

How to “truncate” Dynamodb table

$TABLE=essential-events
aws dynamodb describe-table \
    --table-name $TABLE | \
    jq '.Table | del(.TableId, .TableArn, .ItemCount, .TableSizeBytes, .CreationDateTime, .TableStatus, .ProvisionedThroughput.NumberOfDecreasesToday)' | \
    tee $TABLE-schema.json
aws dynamodb delete-table --table-name $TABLE
aws dynamodb create-table --cli-input-json file://$TABLE-schema.json 

Credits for the truncation go to https://medium.com/@samnco/deleting-content-in-dynamodb-from-the-cli-831ce5ab083c

Pull remote files from sftp server

Those days we almost use cloud for everthing. But sometimes we need to pull files from sftp server. Here are two solutions for that

Pull and remove with sftp

This solution pulls the files then removes them from the remote. There is a gotcha that if you expect a lot of files there might be a chance a file to arrive while the “get -r …” command is executing. Then the “rm *” will remove it. So this is suitable if you expect a few files a week/day

Create a batchfile.sh

get -r upload/* incoming/
rm upload/*

Then add cron

0 5 * * * /usr/bin/sftp -b batchfile.sh username@sftp-corp.company.com

Only pulling with lftp

When I don’t have permissions to remove the files from the remote sftp I use the following off-the-shelf aproach.

This cron is synchronizing files all files to /home/USERNAME/incoming

0 5 * * *  /usr/bin/lftp -u USERNAME,none -e 'mirror --newer-than="now-7days" --only-newer --exclude .ssh --only-missing / /home/USERNAME/incoming; quit' sftp://sftp-corp.company.com

deploy pg gem with postgres 10

When in your distribution the postgres is stick to version 10 and you have to upgrade to postgres-11 a good way to do a capistrano deploy is like this

Do the system install with

yum install postgresql10-contrib postgresql10-devel

And then in your /shared/.bundle/config add a line showing the location of the pg libraries

---
BUNDLE_PATH: "/opt/application/shared/bundle"
BUNDLE_BUILD__PG: "--with-pg-config=/usr/pgsql-10/bin/pg_config"
BUNDLE_FROZEN: "true"
BUNDLE_JOBS: "4"
BUNDLE_WITHOUT: "development:test"

Thanks to my colleague Kris for finding the solution.

Organizing terraform modules in application stacks for free

One of the big challenges is how to organize your cloud account setup.

In one account, you can have a couple of application stacks. The challenge is to able fast to apply/plan/destroy them without any burden.

What works for me is to use one application-terraform-bulk.sh script which will know which modules to which stacks belong. And if I have a couple of modules to apply in the application stack I use terraform-bulk.sh script which just applies all modules in the current folder.

Here is an example.

Those are the ECR modules which must be presented in this account. I do not care about which stack will own them, so I will use the general terraform_bulk.sh script

The commands which I can do are:

./terraform_bulk.sh init
# ...It will go in each folder and do terraform init
./terraform_bulk.sh plan
# ...It will go in each folder and do terraform plan
./terraform_bulk.sh apply
./terraform_bulk.sh destroy

Here is how it looks the script

#!/bin/bash
trap "exit" INT

modules=(
  anaconda
  essential
  essential-anaconda-environment
)


terraform_plan() {
  local project="$1"
  pushd .
  cd $project
  terraform plan
  popd
}



terraform_init() {
  local project="$1"
  pushd .
  cd $project
  terraform init
  popd
}


terraform_apply() {
  local project="$1"
  pushd .
  cd $project
  terraform apply -auto-approve
  popd
}


terraform_destroy() {
  local project="$1"
  pushd .
  cd $project
  terraform destroy -auto-approve
  popd
}


terraform_show() {
  local project="$1"
  pushd .
  cd $project
  terraform show
  popd
}


# array=(1 2 3 4)
# reverse array foo
# echo "${foo[@]}"
reverse() {
    # first argument is the array to reverse
    # second is the output array
    declare -n arr="$1" rev="$2"
    for project in "${arr[@]}"
    do
        rev=("$project" "${rev[@]}")
    done
}





case "$1" in
  init)
      for project in "${modules[@]}"
      do
        echo ""
        echo ""
        echo $project
        terraform_init $project
      done

      ;;

  show)
      for project in "${modules[@]}"
      do
        echo ""
        echo ""
        echo $project
        terraform_show $project
      done
      ;;


  apply)
      for project in "${modules[@]}"
      do
        echo ""
        echo ""
        echo $project
        terraform_apply $project
      done
      ;;

  destroy)
      reverse modules reversed_modules
      for project in "${reversed_modules[@]}"
      do
        echo ""
        echo ""
        echo $project
        terraform_destroy $project
      done
      ;;

  plan)
      reverse modules reversed_modules
      for project in "${reversed_modules[@]}"
      do
        echo ""
        echo ""
        echo $project
        terraform_plan $project
      done
      ;;

  *)
      echo $"Usage: $0 {init|apply|destroy}"
      exit 1

esac


In my case in the development cloud account, I have to host two applications. Then I just create two versions of the script like this.

wxr-xr-x 13 guda guda 4096 Nov 15 13:20 .
drwxr-xr-x  6 guda guda 4096 Nov  5 11:20 ..
drwxr-xr-x  3 guda guda 4096 Oct 28 18:14 athena
drwxr-xr-x  3 guda guda 4096 Jul 10 15:17 cm
drwxr-xr-x  5 guda guda 4096 Dec  5 22:42 ecr
drwxr-xr-x 11 guda guda 4096 Oct 28 18:39 endpoints
-rwxr-xr-x  1 guda guda 2345 Oct 28 18:38 essential-terraform_bulk.sh <<<<<
-rwxr-xr-x  1 guda guda 2190 Oct 28 18:14 etl_monitoring-terraform_bulk.sh <<<<<
drwxr-xr-x  3 guda guda 4096 Nov  5 11:24 fargate_essential
drwxr-xr-x  3 guda guda 4096 Oct 28 18:47 rds
drwxrwxr-x  3 guda guda 4096 Sep  3 19:48 s3
drwxr-xr-x  5 guda guda 4096 Oct 28 18:47 secret_manager
drwxr-xr-x  3 guda guda 4096 Aug 15 17:02 vpc
drwxr-xr-x  4 guda guda 4096 Nov 15 13:20 vpc_peering
drwxr-xr-x  3 guda guda 4096 Aug 19 14:51 zone_security_groups

So when I want to provision:

  • essential app – use essential-terraform_bulk.sh
  • etl monitoring app – use etl_monitoring-terraform_bulk.sh

Be aware when you have to share the resources – for example vpc, you do not want the first terraform-bulk.sh to drop a resource which is needed by the second application terraform bulk.

Switch configuration lines using comments

Recently I have the case where I have to use a base Docker image from a remote or local repository. I love to keep the configuration close and to not have a couple of configuration files with the same content so I decided to write a simple program which will do the config switch and then will return it back (if needed)

Here is an example of usage:

When I build the images locally I want to use:

FROM anaconda-environment:latest

When I want to build from our CI/CD server I would like to use the remote ECR

CONFIG->remote-images:FROM XXXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/anaconda-environment:latest

Then in the Docker file, I put those lines

# CONFIG->local-images:FROM anaconda-environment:latest
# CONFIG->remote-images:FROMXXXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/anaconda-environment:latest
FROM anaconda-environment:latest
ENV ACCEPT_INTEL_PYTHON_EULA=yes
.... and so on...

And here is how the config is changed to point to the remote-images

ruby switch-config.rb Dockerfile remote-images

and this is how it is changed back to local images.

ruby switch.rb Dockerfile local-images

So far I haven’t found any drawback on this approach. For sure there is some drawbacks please let me know if you hit it.

And here is the code…

#!/usr/bin/env ruby

# ruby switch.rb Dockerfile remote-images
#
# # CONFIG->local-images:FROM anaconda-environment:latest
# # CONFIG->remote-images:FROM XXXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/anaconda-environment:latest
# FROM anaconda-environment:latest




file = ARGV[0] || "Dockerfile"
desired_environment = ARGV[1] || "local"
lines = File.read(file).split("\n")
tag = /\s*?#\s+CONFIG->([\w\-]+?):(.*)\Z/
new_lines = []
key_found = true
options = {}
found_config_lines = false

lines.each do |line|
  if line =~ tag
    environment_key = $1
    option = $2
    options[environment_key] = option
    found_config_lines = true
  end

  if found_config_lines && options.values.include?(line)
    key_found = options.key?(desired_environment)
    new_lines << options[desired_environment]
    found_config_lines = false
    options = {}
  else
    new_lines << line
  end
end

if key_found
  File.write(file, new_lines.join("\n"))
else
  puts "Something is wrong - key not found! The syntax for defining options is:"
  puts "# CONFIG->KEY:The content of this option"
end

© 2020 Gudasoft

Theme by Anders NorénUp ↑