Professional Documents
Culture Documents
Data_processing_worskhop_PNCC_lite
Data_processing_worskhop_PNCC_lite
Data_processing_worskhop_PNCC_lite
workshop
PNCC center, July 20-24 2020
1
July20: cryoSPARC
9 am -11 am cryoSPARC overview – presented by Cryosparc CEO and
co-founder (RECORDED session)
11:00 am -1 pm cryoSPARC setup, overview, gain references(ZOOM-
LIVE)
July21: cryoSPARC
9 am-1 pm cryoSPARC training on the dataset #1 (ZOOM-LIVE)
July22: cryoSPARC/relion
Schedule
9 am-11 am cryoSPARC training on the dataset #2 (ZOOM-LIVE)
11 am-1 pm relion setup and training on the dataset #1 start (ZOOM-
LIVE)
July23: relion
9am -1 pm relion training on the dataset #1 continue Follow up on
the cryoSPARC processing of dataset #2 (ZOOM-LIVE)
2
• Location on cascade
/dtemp/emslpXXXXX/workshop_package/
https://www.nature.com/articles/s41586-018-0786-7
3
To login:
ssh userID@cascade.emsl.pnl.gov -X -Y
Useful commands:
4
Single Particle
2D classification
Data Processing
(SIMPLIFIED)
Import Movies
Ab initio reconstruction
Motion correction
Particle picking
Post-processing
5
Cryosparc 2.15
RELION3.1
Available on cascade
RELION3.0
Available cisTEM
software
Other platforms: EMAN2, SPHIRE; Scipion; Focus
6
1) Your cryosparc password is saved in your
/home/userID/ directory in .cryosparc file. Your
username is your home institution email.
vi .cryosparc
7
• Let’s DIVE IN
In your terminal, type:
cd /dtemp/emslpXXXXX/
STEP1 ll
cd workshop_package/
8
Cryosparc
Organization/Recommendations
STEP2
+ Add workspace.
10
Import
STEP3
12
Submission lanes
STEP3
For Admins
13
Motion Correction From the “Job Builder” panel
STEP4 a)
2-in-1: Full-frame + Local motion
(multi) parallelized over multiple GPUs
No need for particle locations unlike
“Local Motion Correction” job
b)
DRAG
b)
DRAG
15
Exposure curation (OPTIONAL)
STEP6
STEP7 –TEMPLATE PICKING (ROUTE 1) that it is in pix. ### For elongated particles 1.5
values are often used instead of 2.
For ApoFe, box size =2x120(A)/0.821(A/pix)=292
• Select “Manual picker” from the JobBuilder pix
For Connexin, box size
=2x150(A)/0.665(A/pix)=451 pix. 512
• For ApoFe dataset, aim for 100 particles picked across several micrographs
• Once done, click “Done Pocking! Extract Particles”
18
Particle picking
STEP7 –TEMPLATE PICKING (ROUTE 1)
• Select “2D classification” from the JobBuilder; drag particles from the previous job;
set number of classes to 10 and press Queue.
• Once the previous job is complete, press “Select 2D” from the JobBuilder, drag
particles and class_averages and press Queue.
19
Particle picking
STEP7 –TEMPLATE PICKING (ROUTE 1)
• Select “Inspect Picks” from the JobBuilder. Drag particles and micrographs from the
previous job and press Queue. Goal is to eliminate false positives. True particles
generally have higher NCC and Power scores.
• Select “Extract from Micrographs” from the JobBuilder. Drag particles and micrographs
from the previous job, specify up to 2 GPUs and the box size and press Queue.
• Select “2D classification” from the JobBuilder. Drag particles from the previous job, specify
20
up to 2 GPUs and the box size and press Queue.
Particle picking
STEP7 –BLOB PICKING (ROUTE 2)
• Let’s create a new workspace with a new name “blob picking route” to keep
ourselves organized. Link “Curate Exposures” job from the other workspace to your
new workspace.
• In your new workspace, select “Blob picker” from the JobBuilder. Specify minimum
and maximum for the particle diameter and press Queue. (I used 100A and 150A
for ApoFe)
• Perform “Inspect Picks”, “ Extract from micrographs” and “2D classification” as
specified in the slide 21 but using a current blob-selected particle pool.
21
Particle picking
STEP7 -TOPAZ
Topaz picks
DRAG
DRAG
Should look like 2D
projections of your
Press “Select 2D” from the JobBuilder. Select good classes, Press Done.
atomic model. Should
see internal features.
The quality of 2D
classes defines the
quality of your 3D
map.
23
2D classification
STEP8
The most common questions:
There is no one defined answer. This will depend on your protein sample (such as protein shape, symmetry and
heterogeneity) and the quality and size of your image datasets.
Generally, start with a minimum of 2 rounds of 2D classification for highly-symmetric molecules such as ApoFe. For
complex proteins, at least 3 to 4 rounds of 2D classifications are needed.
The same trend applies for the number of classes: 50 classes are sufficient for more globular, highly symmetric
molecules. For more complex samples and large datasets, 150—200 classes are often chosen. We don’t find it
necessary to run more then >250 classes, it becomes computationally exhausting and does not provide any additional
data curation benefits.
24
Ab initio reconstruction/Homogenous
refinement
STEP10-11
• Select “Ab-initio reconstruction” from the JobBuilder. Drag particles and press
Queue.
######### Note that you can pick >1 “ Number of Ab initio classes” to find multiple
conformations or distinct particles.
• Once a previous job is complete, select “Homogenous Refinement (NEW)” (for the
ApoFe dataset) from the JobBuilder. Drag particles and a volume, specify
”Refinement box size” and “ Symmetry” and press Queue.
Your unsharpened
volume
26
Sharpening
STEP12
From the “Homogenous refinement (NEW)” job:
27
Other useful information
Create an automatic workflow: you don’t have to wait until the job is finished!!!
28
Please contact Irina at:
irina.novikova@pnnl.gov
For
questions/
If the problem is regarding a specific job, copy
and paste a job link along with your question.
errors in
cryosparc
29
• Create a destination folder where all your data will
reside. New dataset/protein structure to solve = new
folder.
• The folder organization is critical!!!!
In your terminal, type:
cd /dtemp/emslpXXXXX/
mkdir relion/
cd relion/
mkdir dataset_name/
cd dataset_name/
or
30
Relion 3.1
Launching GUI
STEP2
Do not launch the GUI on the general login node (glogin). The glogin is shared between many
supercomputer users and is generally used to browse files , do simple operations and submit
slurm jobs. It is not designed to handle visualization of large images, which cause the problems for
other users.
To launch the GUI, you will have to reserve a node just for you by running the following command.
## -W 180 stands for 180 min session
You will see a new terminal window appear (give it a minute), and gXXXX will be displayed
instead. -r pncc_workshop
31
Relion 3.1
Launching GUI
STEP2
JOBS
33
vi movies.star
Relion 3.1
Motion Correction STEP4
Once the job is submitted via a
slurm scheduler (sbatch), the
GUI can be closed and opened
I/O tab
if available
preferred, allows Bayesian polishing at the post-processing stage
Running tab
Using these settings, the job takes around 13 min. The output of the job can
be visualized by clicking on MotionCor/job002/ and then on Display: out:
corrected_micrographs.star and logfile.pdf.
Note: to visualize a specific subset of images, use ‘Subset selection” job first.
This is particularly important when you have hundreds of images, relion will
try to read all of them which takes a significant amount of time. 35
Relion 3.1
I/O tab
CTF estimation STEP5
Specify Yes
Press RUN
36
Relion 3.1
CTF estimation STEP5
Using these settings, the job takes around 35 sec. The output of the job can
be visualized by clicking on Ctffind/job003/ and then on Display: out:
micrographs_ctf.star.
38
Relion3.1
Particle picking
STEP6 –MANUAL PICKING
Helpful to play with if the images are noisy Make sure that you
Specify the pixel size save the coordinates!
Press RUN
Start to use
alias
YES
Laplacian tab
Depends on the contrast of the images. For low contrast micrographs, values of 1.5 may be reasonable
Press RUN
42
Relion3.1
Particle extraction
STEP7
I/O tab
Press RUN 44
Relion3.1
Making templates for auto-picking
STEP8
YES
Class
Press RUN
45
Relion3.1
Auto-picking optimization
STEP9
Before the use of generated 2D templates in the auto-picking procedure, 4 parameters
require optimization:
Picking threshold (how restrictive the particle picking is)
Minimum inter-particle distance (min is 50% of the longest dimension)
Maximum stddev noise
Minimum avg noise
To save time, only a few micrographs will be used for this.
Go to “Auto-picking” job.
Select/afewmic/micrographs_selected.star
Make sure it is a NO
46
Relion3.1
Auto-picking optimization
STEP9
autopicking
Specify YES
Specify YES
KEEP EMPTY
Press RUN, no queue The job takes around 6 min. The output of
the job can be visualized by clicking on
AutoPick/job00X/ and then on Display:
out: coords_suffix_autopick.star.
47
Relion3.1
Auto-picking optimization
STEP9
Let’s continue to play with the optimization parameters. Click on your previous
Autopick job (in Finished jobs) and go to “autopicking” tab.
autopicking
Specify NO
Specify YES
Press CONTINUE
Play around with parameters until optimal values are found. Once ready:
Specify NO
Specify NO
49
Relion3.1
2D classification
STEP11
The main goal is to remove bad particles in order to clean up your data!!!
Optimization I/O
/home/scicons/cascade/apps/chimera/bin/chimera run_it150_class001.mrc
However, we would not advise to use it. The mouse movements for this image
visualization software are quite delayed when traveling over the network. It is the best
just to copy the output file for examination by Chimera on your local computer.
Open a new terminal window (do not login to cascade!!!) and copy the file to your local
computer:
52
Relion3.1
3D classification
The goal is to identify a
STEP13 homogenous subset.
I/O
Reference
Even though we know the symmetry, it is the best to specify C1 for the first round
CTF
Optimization
Using more classes is useful for heterogenous datasets. For ApoFe, use just 1
Compute
Also useful to look at your structure in slices view: click on Class3D/jobXXX/ and then on
Display: out: run_it025_class001.mrc
54
Relion3.1
High-resolution 3D refinement
STEP14
The goal is to refine a selected homogenous subset to high-resolution.
However, all our particles were previously downscaled. Let’s re-extract
the particles with less or no downscaling.
I/O
extract
56
Relion3.1
High-resolution 3D refinement
STEP14
$ /dtemp/emslc50414/kschmidt/relion31_cuda/bin/relion_image_handler --i
run_half1_class001_unfil.mrc --lowpass 15 --angpix 0.821 --o
run_half1_class001_unfil_lowpass.mrc
58
Relion3.1
Post-processing: sharpening and calculating masked FSC
curves
STEP15
I/O
Mask
Determined in Chimera
Add more pixels to make the mask less tight
Try different values
Check the resulting file (mask.mrc) in Chimera to make sure it encapsulates the entire
structure but has minimal solvent area !!!!
59
Relion3.1
Post-processing: sharpening and calculating masked FSC
curves
STEP15
Go to Post-processing job:
I/O
Check the resulting output (logfile.pdf for FSC curves) and the final map in chimera
(postprocess.mrc)
60
Relion3.1
Post-processing: sharpening and calculating masked FSC
curves
STEP15
Make sure that the FSC of the phase-randomized maps (red curve) is
close to 0 at the estimated resolution of the map. If it is not, the mask is
too sharp. Re-do the mask by using a stronger low-pass filter or softer
mask parameters.
61
Relion3.1
CTF refinement (OPTIONAL)
STEP16
This job can lead to further improvements in resolution (correct for
aberrations. Go to ”CTF refinement” job.
I/O
Specify NO initially
Specify NO initially
Fit
Specify YES
Fit
63
Relion3.1
CTF refinement
STEP16
Re-estimate the defocus values for each particle:
Specify NO
Specify YES
Fit
Specify NO
Specify NO
64
Relion3.1
Bayesian polishing(OPTIONAL)
STEP17
It is per-particle, reference-based beam-induced motion correction. IT
TAKES TIME and WILL NOT BE PERFORMED IN WORKSHOP.
Run first in Training mode, followed by Polishing mode on entire dataset.
Training mode:
I/O
Specify YES
Train
RUN
66
Relion3.1
Re-do high-resolution 3D refinement using polished
particles
STEP18
Go to 3D auto-refine tab:
Use shiny.star
I/O
Specify YES
The job takes ~1 hour 40 min.
Output files are
run_model.star and
run_class001.mrc will be
Submit to queue: YES and press RUN written. The resolution got
improved from 2.42 to 2.15!!!
Re-do postprocess job with
shiny particles!!!
67
Relion3.1
Local resolution estimation
STEP19
Go to “Local Resolution” job:
Specify NO
ResMap
relion_locres.mrc.
Check also the handedness of the structure (50% chance of being wrong). To flip:
$ /dtemp/emslc50414/kschmidt/relion31_cuda/bin/relion_image_handler --i
PostProcess/jobXXX/postprocess.mrc --o PostProcess/jobXXX/postprocess_invert.mrc
–invert_hand
69
Relion3.1
Heterogeneity
All datasets are heterogenous!!!!
70
Cryosparc
Useful info
3D variability analysis: https://www.youtube.com/watch?v=0O781Od1z_E
71