Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Condor annex issue with aws spot request

Environment

os: centos-release-7-9.2009.1.el7.centos.x86_64
condor: CondorVersion: 8.8.12 Nov 24 2020 BuildID: 524104 PackageID: 8.8.12-1 CondorPlatform: x86_64_CentOS7
aws cli: aws-cli/2.1.19 Python/3.7.3 Linux/3.10.0-1160.11.1.el7.x86_64 exe/x86_64.centos.7 prompt/off

The Issues
1. The instance created don't start the condor properly
2. The security groups or tags(like instance Name) are not set when run with condor_annex but works with aws cli

Questions

1. Is this a condor_annex issue or we are doing something wrong


2. If so, what we should change
3. Can you provide a working scenario for condor annex with spot, how the AWS user need to be configure on aws, what configurations to make on master and
worker, aws spot config file, how to configure condor_annex.

Steps

1. Configure aws with user credentials(this user credentials are used for on demand and that works)

2. Add resource based permission for lambda function used by condor(ID= our aws id)

aws lambda add-permission --function-name HTCondorAnnex-LambdaFunctions-sfrLeaseFunction-1UR5MEAYLWXXF /


--statement-id xaccount --action lambda:InvokeFunction --principal arn:aws:iam::ID:user/condor-annex --output text

3. Add to condor config

ANNEX_DEFAULT_ODI_INSTANCE_PROFILE_ARN=aws-ec2-spot-fleet-tagging-role

4. Start annex

We start a condor annex sport fleet using the next command:

condor_annex -aws-spot-fleet -aws-spot-fleet-config-file /home/centos/config.json -slots 1 -annex-name Test_Annex

We generate the config-file using aws spot request UI, as the condor documentation recommend.

Same config file create a instance using aws cli with all the fields that we provide(like security groups and Name)

aws ec2 request-spot-fleet --spot-fleet-request-config file://config.json


Notes

1. The AMI that we use works when we use on demand

2. Same AMI that we use on demand is not working with condo_annex spot fleet settings. Below is an exception from
worker:

[ec2-user@ip-172-31-44-184 ~]$ systemctl status condor.service


● condor.service - Condor Distributed High-Throughput-Computing
Loaded: loaded (/usr/lib/systemd/system/condor.service; enabled; vendor preset: disabled)
Active: inactive (dead)

[ec2-user@ip-172-31-44-184 ~]$ systemctl status condor-annex-ec2.service


● condor-annex-ec2.service - Boot-time configuration for an HTCondor annex instance
Loaded: loaded (/usr/lib/systemd/system/condor-annex-ec2.service; enabled; vendor preset: disabled)
Active: activating (start) since Jo 2021-01-21 10:04:26 UTC; 3min 14s ago
Main PID: 4297 (condor-annex-ec)
Tasks: 2
Memory: 12.6M
CGroup: /system.slice/condor-annex-ec2.service
├─4297 /bin/sh /usr/libexec/condor/condor-annex-ec2 start
└─5512 sleep 20

ian 21 10:04:33 ip-172-31-44-184.eu-west-1.compute.internal condor-annex-ec2[4297]: Configuring HTCondor to be an EC2 annex:


Unable to locate credentials. You can configure credentials by running "aws configure".

3. /usr/libexec/condor/condor-annex-ec2 start command is hanging on worker created using spot fleet:

[ec2-user@ip-172-31-44-184 ~]$ ps aux | grep condor


root 4297 0.0 0.1 124316 3248 ? Ss 10:04 0:00 /bin/sh /usr/libexec/condor/condor-annex-ec2 start
ec2-user 5551 0.0 0.0 119416 920 pts/0 S+ 10:09 0:00 grep --color=auto condor
config file (with id and SG removed)

{
"IamFleetRole": "arn:aws:iam::ID:role/aws-ec2-spot-fleet-tagging-role",
"AllocationStrategy": "capacityOptimized",
"TargetCapacity": 1,
"SpotPrice": "0.0488",
"ValidFrom": "2021-01-19T08:11:53Z",
"ValidUntil": "2022-01-19T08:11:53Z",
"TerminateInstancesWithExpiration": true,
"LaunchSpecifications": [
{
"ImageId": "ami-0ecfbf71e15473323",
"InstanceType": "a1.medium",
"KeyName": "k8s",
"SpotPrice": "0.0288",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"VolumeType": "gp2",
"VolumeSize": 64,
"SnapshotId": "snap-004fa2168b4ae3241"
}
}
],

"NetworkInterfaces": [
{
"DeviceIndex": 0,
"SubnetId": "subnet-1d4f2845",
"DeleteOnTermination": true,
"Groups": [
"sg-ID1",
"sg-ID2"
],
"AssociatePublicIpAddress": true
}
],
"TagSpecifications": [
{
"ResourceType": "instance",
"Tags": [
{
"Key": "Name",
"Value": "Spot_test"
}
]
}
]
},
{
"ImageId": "ami-0ecfbf71e15473323",
"InstanceType": "t2.small",
"KeyName": "k8s",
"SpotPrice": "0.025",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"VolumeType": "gp2",
"VolumeSize": 64,
"VolumeSize": 64,
"SnapshotId": "snap-004fa2168b4ae3241"
}
}
],
"NetworkInterfaces": [
{
"DeviceIndex": 0,
"SubnetId": "subnet-1d4f2845",
"DeleteOnTermination": true,
"Groups": [
"sg-ID1",
"sg-ID2"
],
"AssociatePublicIpAddress": true
}
],
"TagSpecifications": [
{
"ResourceType": "instance",
"Tags": [
{
"Key": "Name",
"Value": "Spot_test"
}
]
}
]
},

{
"ImageId": "ami-0ecfbf71e15473323",
"InstanceType": "c6gd.medium",
"KeyName": "k8s",
"SpotPrice": "0.0436",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"DeleteOnTermination": true,
"VolumeType": "gp2",
"VolumeSize": 64,
"SnapshotId": "snap-004fa2168b4ae3241"
}
}
],
"NetworkInterfaces": [
{
"DeviceIndex": 0,
"SubnetId": "subnet-26376b50",
"DeleteOnTermination": true,
"Groups": [
"sg-ID1",
"sg-ID2"
],
"AssociatePublicIpAddress": true
}
],
"TagSpecifications": [
{
"ResourceType": "instance",
"Tags": [
{
"Key": "Name",
"Value": "Spot_test"
}
}
]
}
]
}
],
"Type": "request",
"TagSpecifications": [
{
"ResourceType": "spot-fleet-request",
"Tags": [
{
"Key": "Name",
"Value": "Spot_test"
}
]
}
]
}

You might also like