Data preparation

Before using nanocompore, sequencing data have to be basecalled (Albacore or Guppy), aligned on a transcriptome reference and resquiggled with Nanopolish.

To simplify the data preprocessing we wrote a Nextflow pipeline which automatises all these steps as well as extra quality control steps:

Reads basecalling

Firstly, raw fast5 reads have to be basecalled with a recent version of ONT basecaller. Basecalled fast5 files are not required for the rest of the analysis, only the raw fast5 and the basecalled fastq.

Example with Guppy v2.3.5

guppy_basecaller -i {raw_fast5_dir} -s {dest_dir} --flowcell {flowcell_id} --kit {Kit_id} -r --calib_detect --enable_trimming true --trim_strategy rna --reverse_sequence true

Then the output fastq files should be concatenated in a single one.

cat {dir_to guppy output}/*.fastq > {basecalled_fastq}

Transcriptome alignment

Basecalled reads have to be aligned to a reference. For dRNA-Seq, reads should be aligned to a reference transcriptome (not genome) in a non-spliced fashion. For example, one can download reference transcriptome fasta files directly from Gencode for human and mouse.

Bam files have to be filtered to remove any reads that would be unmapped, secondary and supplementary as well as reads mapped on the reverse strand (SAM flag 2324). We also recommend to discard reads with a very low alignment score (MAPQ<10). Finally, reads have then to be sorted and indexed.

Example with Minimap2 v2.16

minimap2 -ax map-ont -L {transcriptome_fasta} {basecalled_fastq} | samtools view -bh -F 2324 -q 10 | samtools sort -O bam > {aligned_reads_bam}

samtools index {aligned_reads_bam}

Read indexing and resquiggling

Nanopolish is required to realign raw signal to the expected reference sequence. For each samples, reads have to be preprocessed with nanopolish 0.10.1+. First index the reads with nanopolish index and then resquiggle them with nanopolish eventalign

Please be carefull to use the following options with nanopolish: --print-read-names --scale-events --samples

Example with Nanopolish v0.10.1

nanopolish index -s {sequencing_summary.txt} -d {raw_fast5_dir} {basecalled_fastq}

nanopolish eventalign --reads {basecalled_fastq} --bam {aligned_reads_bam} --genome {transcriptome_fasta} --print-read-names --scale-events --samples > {eventalign_reads_tsv}

Finally the data has to be collapsed per kmer and indexed using the Eventalign_collapse command provided in Nanocompore.

nanocompore Eventalign_collapse -t 6 -i {eventalign_reads_tsv} -o {eventalign_collapsed_reads_tsv}

Once you have done that with all your samples, you are ready to run SampComp, the sample comparison command of Nanocompore